I’m trying to use Graphhopper in Spark to find routes from a very large data set. Does anyone know of any documentation/examples for doing this?
Alternatively, some specific questions which I’m stuck with:
calling importOrLoad is unlikely to work as it’s referring to a folder on a single filesystem (which clearly won’t work distributed, but also means there might be problems with the locks used.) It’d work best in Spark if I could have my *.pbf file in HDFS … but I’m not sure how I’d get that to work in Graphhopper.
alternatively it seems I could serialize a Graphhopper instance (with the OSM info already loaded), but I’m not sure if that’s possible …
PS - I’ve managed to get it running in Spark as a non-distributed task, but that’s not too useful.
I’ve no experience with Spark but HDFS is just a distributed file system - why not put the GraphHopper folder there? E.g. run importOrLoad for one GraphHopper and then a graphhopper folder is created and you can then use the distributed nature of HDFS. I.e. other GraphHopper instances do not need to import the OSM data again and will just load the data from the folder into the RAM. You can even tell GH to avoid writing to this folder just to make sure or if the FS is read only (hopper.setAllowWrites(false))
Admittedly I’m new to HDFS/Spark too, but the “problem” is that it’s not exactly like a distributed file system - e.g. the data is split into chunks (which may be on different machines). GH/Java’s loading (which assumes a normal filesystem) probably won’t work in this case. However, I suspect it’s tweakable, so maybe I’ll try to write a new DataReader which handles this. Alternatively, I may be able to just “broadcast” the graph - I’ll have to see.
I managed to get it working, but it was pretty painful and ugly. I’m sure it shouldn’t be too hard to tweak graphhopper to make it all a little easier … but I’d need to be more familiar with the internals.
I understand that Java storage libraries often come with custom URL handlers (we wouldn’t have to include them - the user registers them with the JVM), so this may be a good interface to also allow loading a graph from Amazon S3 or some such.
I’m not sure @michaz - though that is how I e.g. interface with it Hadoop/Spark. I was thinking (naively) of an overwrite-able method for getting the bytes for the OSM file, and leave it up to the user how to provide that. A quick look at the link @karussell provided shows it kind of does this (with openStream).