I am quite new as far as routing algorithms are concerned. So I have few doubts to clear.

As I am trying to explore routing algorithms, I understand that most common ones are Dijkstra, A* ,etc.
I want to know if various routing engines, be it GH or Uber or OSRM, implement these algorithms then how do these routing engines differ form each other.

Well like with everything there are many different kinds of problems and many different solutions to them. For example with the Dijkstra algorithm it is very easy to adjust the metric you base the shortest path calculation on (could be a combination of travel time, distance, number of turns etc.), but it is rather slow. Then there are speed-up techniques like Contraction Hierarchies (implemented in both OSRM and GH) that allow faster routing, but the metric has to be known in advance and heavy calculations have to be done initially. There are also combinations like MLD in OSRM or LM in GH that do pre-processing, but still allow changing the metric later. Some algorithms use more memory, others less so they are more useful on mobile devices. Dijkstra is a one-to-many (or even one-to-all) algorithm, but maybe we need to calculate one-to-one shortest paths only, or maybe many-to-many or even all-to-all, and of course there are different routing algorithms for that.

The routing engines do a lot more than just implementing the routing algorithms though. For example there are differences how the graph is stored in memory, how it is persisted on disk (or in a database), and what kind of data is stored with it. Then they usually give solutions to more than just the routing problem. There are solutions for map matching, (reverse) geocoding etc.
Finally they are of course written in many different programming languages, have different licensing and so on

Recent advance in Reinforcement Learning seems to address TSP and VRP problems in a way that challenges the traditional Combinatorial Optimization approach.

My question are:

Do you find that RL is mature enough for real-world problems?

Is it possible to train a network once and use it for multiple sets of tasks on top of the same area (e.g. using transfer learning)?

N.B - references
To start with, you can look at this tutorial.

A recent paper shows a method for VRP. They also shared their code on Github.