GraphHopper external Directory

tm_slavik57 · August 6, 2024, 9:08am

Hi,
We want to use GraphHopper with a microservices architecture where services can scale up and down as the load changes.
We want to use a large OSM map (entire world).
We want to use the map matching algorithm and to search for closest nodes based on geo location.
We want the GraphHopper to be quickly loaded with the map which will be shared across multiple services.
The way we want to approach it is to create a custom Directory class with a custom DataAccess class that will have 3 tiers of storage:

The 1st tier will be in memory. It will hold a limited amount of “segments” based on some caching strategy, like LRU or something similar.
The 2nd tier will be a Redis cluster, that will hold a larger but yet limited amount of “segments”
The 3rd tier will be an S3 storage, that will hold all the “segments”.

When the graph needs some data we will fallback through the storage types and manage which layer holds what by adding/deleting segments data.

From what we understand this should work, and we can tune the sizes of the caches for our performance needs.

I have a few questions:

In the Directory interface, there are methods that reference the DAType, and the DAType needs the MemRef enum. This enum has only HEAP and MMAP options. What would be the better option for us and what these options affect in the code?
If none of the MemRef values is really appropriate, could a CUSTOM enum value be added?
Are we safe to assume that the map isn’t stored outside the DataAccess class?

karussell · August 6, 2024, 4:45pm

Why would you want multiple GraphHopper instances per server using the same graph cache? The GraphHopper server is not single threaded.

When the graph needs some data we will fallback through the storage types and manage which layer holds what by adding/deleting segments data.

You can’t do that. Routing requests would take arbitrarily long time. Instead always copy to local storage and if you have low memory use MMAP instead of RAM_STORE but preferred way is to have enough memory and load everything into memory.

tm_slavik57 · August 6, 2024, 7:00pm

That’s not what we want, we want to have multiple servers, each running a single GraphHopper.

Holding all the world in memory will require very expensive servers (tens of GB o RAM).
Also, copying all the data to the server disk will take a very long time considering we want to be able to scale up a sevrer within 1 minute.
We aren’t planning on having routing requests, we want to do map matching logic (gps trace → path)

The main issue as I see it now is that edges aren’t localized in a segment. If segments were storing localized data (tiles) then loading new tiles wouldn’t happen much on a given route. Also the routing would probably be faster.
Making the graph store the nodes and edges in a localized tiles manner would require some preprocessing of the pbf file and it could be tricky because it’s quite hard to know how many bytes each edge and node will take (In order to decide on a segment size that will be related to a tile).
It would help a lot of the storage layer would receive some localization information when storing or reading data. Like the tile of the data or gps position of the node/edge start. This way we would be able to brake the files/redis keys based on the tiles keys.