LocationIndex performance tuning

Hi, I monitored the request times of our graphhopper instance over a couple of weeks and notices that our routing (CHs) is factor 20 faster than the location lookup. Which is great for routing but bad for the location index tree:

Average sum of call times:

  locationIndex#findClosest 113.00 ms
  routingAlgorithm#calcPath   5.28 ms

I use the low level api and the default settings:
Prepare:

LocationIndexTree locationIndexTree = new LocationIndexTree(storage.getBaseGraph(), directory);
locationIndexTree.prepareIndex();
locationIndexTree.flush();
locationIndexTree.close();

Runtime:

LocationIndexTree locationIndexTree = new LocationIndexTree(graph.getBaseGraph(), directory);
locationIndexTree.loadExisting();

I would expect that routing is a lot more time-consuming than a location lookup - any change to gain factor 50x in location lookup speed?

Jan

This is strange, on average our benchmarks show <0.05ms with maximum under 1ms

Standard settings should be fine. What GH, Java and OS version are you using? How do you measure ‘average’? Can you try running ./graphhopper.sh measurement osm.pbf and show all the location2id.* numbers?

Hi,

I’m running Graphhopper 0.5 (this is the last commit) on ubuntu 12.04 (previous LTS version) . AWS EC2 r3.4xlarge instance. I have 3d enabled and running with full planet data.

Since CPU/Memory/concurrent requests influence the actual performance a lot, I gave the the calcPath value as reference.

The math behind both values is: For each http request: Sum of duration of method calls. Average over all requests.
One request can have 2…n waypoints and therefore n calls of findCloses and n-1 calls of calcPath. So depending on how many waypoints there are in routes, best case of the ratio findClosest / calcPath is x10, in worst case x20.
(Since the factor is so high it doesn’t make sense to break it down to single method calls, now)

What is your ratio?

What test setup did you think of? Should I download the current 0.5 branch and execute ./graphhopper.sh measurement osm.pbf? Would most likely be more similar to your setup than mine… What would be the conclusion if the values are low and if the values are high?

Jan

Did you tried a customized GH 0.5 or a fresh 0.5 install?

Should I download the current 0.5 branch and execute ./graphhopper.sh measurement osm.pbf?

Please can you try 0.6 (and optionally 0.5) on your local laptop and on the AWS machine? If there is a difference between AWS and local setup I would say it is more likely a hardware problem? If both results shows us same bad results I would need your requests so that I can try&debug this deeper.

What is your ratio?

lower than 0.1

Hi Peter,

So your location lookup vs CHs routing is on average 10 times faster? That is impressive, hope I can reach that, too!

A hardware-specific issue wouldn’t be my first guess, especially since we talk about ratio between two CPU/RAM intensive operations here but I’ll give it a try:

Running on my fork I linked above:

Tested locally with a current bayern extract:

# ./graphhopper.sh measurement ~/Downloads/bayern-latest.osm.pbf | grep location2id
2016-05-09 16:20:08,296 [main] INFO  com.graphhopper.tools.Measurement - location2id: sum:1s, time/call:0s

Tested remote:

# ./graphhopper.sh measurement /tmp/bayern-latest.osm.pbf | tee /tmp/gh-remote.log
# grep location2id /tmp/gh-remote.log
2016-05-09 14:49:38,806 [main] INFO  com.graphhopper.tools.Measurement - location2id: sum:0.9s, time/call:0s

Does it help already? Looks pretty equal.

What do you mean by “I would need your requests”? Keep in mind that I don’t use the graphhopper-web interface but the low level api internally.

Best,

Jan

especially since we talk about ratio between two CPU/RAM intensive operations

Oh, true!

So your location lookup vs CHs routing is on average 10 times faster?

Yes, as only a portion of nodes is visited this should be the case :slight_smile:

Would you print the location index results that are inside the created measurement.properties instead of the std-output?

What do you mean by “I would need your requests”?

I would need the points and parameters (e.g. for the internal api) you use to debug this.

Output local:

location2id.max=15.705131
location2id.mean=0.10862222449999999
location2id.min=0.005441
location2id.sum=1086.222245

Output remote (twice):

# grep location2id /tmp/measurement2016-05-09_16_20_04.properties
location2id.max=45.754461
location2id.mean=0.1015864186
location2id.min=0.005262
location2id.sum=1015.864186

# grep location2id /tmp/measurement2016-05-09_14_49_36.properties
location2id.max=1.17515
location2id.mean=0.0901121341
location2id.min=0.006464
location2id.sum=901.121341

This means 0.1ms per location index call, which is what I would expect. Maybe you just measured differently? I’m still unsure how you did it. This can be really tricky. Maybe you try with our Measurement or MiniPerfTest class. Or see http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking/

Edit: Of course, it can be also the case that our benchmarking is defect. Maybe you try your code with jmh?

The ratio hint was already great. I created a list of differences between your test case and my setup. We can rule our the hardware now. I don’t think it is request-specific because I measure this for a long time already. That said, I think I found a difference in the storage backend (DataAccess implementation), see earlier post and started a profiling session. There is probably spend too much time in a logging line. I’ll update you here when I have more evidence!

About how I measured it: We use NewRelic (commercial APM solution) to monitor our instances.

Jan

Maybe try async logging to see if this helps?

Ok, I figured out what was mainly responsible for the extremely long durations for the location lookup. Basically the method used for reading the location index data from offheap storage had a logging line too much, while the method used for retrieving routing data was efficient.

Having fresh measurement data shows that is is now a lot better (but still not at the same performance as your setup, Peter):

All times are average values per request and this is live data over a couple of hours, not a micro benchmark:

method                       % Time   Avg calls  Avg total duration
RoutingSession/lookup	       15.8      12.4      8.96
RoutingQuerySession/route	6.3      11.7      3.58

So on average we have per request the same number of point lookups (findClosest) and routing requests(calcPath), meaning we have a lot of A-B-C … D tours. The total duration of all method calls per request is 8.96 vs 3.58 (So the ratio is now 2.5 where you measured 0.1).
Note, 100% would be all time spend in the request, including http stuff etc.

Not sure if I can optimize that further, what are absolute times for findClosest for you in production? 9ms for 12 method calls seems ok to me. Maybe you have on average much longer route length, while we have short route segments with many waypoints, so your overall ratio is a lot better.

I’ll run a more detailed profiling next week.

Other opinions or ideas?

Thanks for sharing. Yes, this could be the case that for very short distances CH routing is more efficient than the lookup, as the lookup needs to explore a few nodes. What is your average route length?

Very tough to guess a route length. I’d say on average the total length is less than 100km, if we’ve on average 12 waypoints that makes less than 10km per individual segment. I’d say 3-7km on average maybe?

Do you have absolute numbers from your production instances?

Jan

No, sorry. But they are in the range of the measurement results.

There is a minor mistake I stumbled over recently. Can you check if this improves the speed if you change

put("locationIndex", DAType.RAM_INT_STORE); to
put("location_index", DAType.RAM_INT_STORE);

in GHDirectory?