Loading from Cassandra

spletpeer · September 27, 2016, 6:59pm

Hi, newbie here.

Rather than using files, I was wondering if I could export the pbf data into Cassandra (or any other database). If so, where do I need to start from? is there any documentation about the data structures?

karussell · September 27, 2016, 7:51pm

You would need to implement a new DataAccess type. The knowledge of the graph data structure is not necessary, still it is a bit described here: https://github.com/graphhopper/graphhopper/blob/master/docs/core/technical.md

See also this related topic: Readonly DataAccess for InputStreams instead of Files

Is your use case ‘making graph updates available in a distributed setup’?

spletpeer · September 27, 2016, 9:11pm

This is for a academic research project. I am more interested in processing data in a server-less setting, where the data is stored in Cassandra and Redis and we don’t have any soft-state in the RAM. Then when http queries come in, servlets do the computation to find shortest paths by fetching graph data from Redis (or Cassandra). Do you think it is possible?

spletpeer · September 27, 2016, 9:44pm

The ultimate goal is to first transfer the data from the pbf files to Cassandra, and then import them into Redis(for example for a city) for faster processing. Is creating a DataAccess type that reads/writes data from/to Redis all that I need to do in terms of data management?

karussell · September 28, 2016, 7:06am

Should do it. But I’m now unsure what you want to achieve. What is the problem that you cannot install GraphHopper via the default storage? Also redis and cassandra use RAM. Maybe you need the MMAP_STORE setting to reduce the RAM size requirement?

spletpeer · September 28, 2016, 5:48pm

The problem with using the default storage is that in my setting, applications don’t have access to the file system or RAM (similar to the Google App Engine). Stateless servlets run for a limited amount of time and could do some processing using the database. My goal is to do routing (with live traffic updates) in such an environment. For secondary storage I’ll have Cassandra and instead of RAM, I’ll have something similar to Memcache or Redis.

spletpeer · September 28, 2016, 6:04pm

Also, in addition to the DataAcess type, I think I should also implement my own “Directory”, shouldn’t I?

karussell · September 29, 2016, 7:12am

The problem with using the default storage is that in my setting, applications don’t have access to the file system or RAM (similar to the Google App Engine

Have no experience with GAE. What does it mean to have no access to RAM? You cannot call new?

Also, in addition to the DataAcess type, I think I should also implement my own “Directory”, shouldn’t I?

Yes

spletpeer · September 29, 2016, 7:17pm

You can new and use the heap, you cannot have static objects.
Everytime a new http request comes in, a new instance of your application is created. For the specific application you can use the heap as much as you want but you cant use the filesystem and you cannot have static objects. Thats why they introduce Cassandra and Memcache.

karussell · September 29, 2016, 7:52pm

Ah, okay. I doubt it will be efficient to grab all the graph data from Cassandra to do just a route. The in-memory storage we use is part of the fast route calculation. As Cassandra is column based it could be possible to create a more or less efficient graph based structure but you would need to re-create a lot bigger part of the system: the BaseGraph. Similarly what we did in the early days to get GraphHopper running with Neo4J as storage.

Why is such an environment your requirement? Can’t you use just a ‘normal’ & ‘external’ server to calculate the routes?

spletpeer · September 29, 2016, 8:39pm

Its for a show case of serverless framework that we have.
What I was thinking of was that rather than loading the post processed files from the “-gh” directory, read and write these stuff into Redis or Memcache and everytime a request comes in, load the entire graph from scratch from Redis. It would definitely be slower, but I hope not too much.
After I’ve done this, I’ll then think of putting the .pdf files with some structure into Cassandra, and them load them, process them and put the resulting files (the ones in the “-gh” directory) into Redis.