Optimization for Route API

thpoiani · March 12, 2019, 6:43pm

Hello guys. I’m working with GraphHopper Open Source and my main usage is for Route API.
Do you have any suggestion to improve the dataset for this API?

For example:
I’m using a cropped osm.pbf file. Can I keep only tags for the road network, to reduce the dataset size (if I can do it, which are the tags used on Route API)?

I don’t use Isochrone API, Map Matching API, and Geocoding API. Actually, I don’t even use the front-end. Can I remove these features to reduce some computational resource?

Thanks in advance

leevilux · March 15, 2019, 5:50pm

I am also interested in this question. Am curious to know what you have tried / found out. I am also a user like you, not a developer.

I have looked at the map of a city and removed “tracks” and “footpaths” etc and reduced the size considerably. I only left primary, secondary, tertiary and a few other highways in. I found that it is indeed faster, by about 10%, and affects fastest more than shortest. It also depends on whether you use CH or not.

karussell · March 16, 2019, 1:16pm

What are you requirements? Do you need to reduce RAM usage at query time or import time? Or do you mean something else?

thpoiani · March 19, 2019, 3:31pm

For now, I only trimmed my database to improve the import time.
Some time ago I did something related for a research - I filtered some tags to improve an algorithm’ analysis time.

The import time is not so important for me because I can handle it on my load balancer health-checker. Anyway, could be amazing to reduce it too.

The RAM usage is the most important in my opinion.
Today I’m using AWS EC2 instances for high memory usage.
I can see the memory freeable always decreasing, even without use.

Another point: my usage is the Route API exclusively. How can I remove the ‘front-end’?

karussell · March 19, 2019, 3:51pm

If query speed is sufficient you can try MMAP option for dataaccess but without SSD it will be much slower. It is always a balance of query speed, resources used when doing the import and RAM usage.

Another point: my usage is the Route API exclusively. How can I remove the ‘front-end’?

I suggest not to fork GH, instead create a custom MyApplication class in a project that depends on GH and then do not add the assets: https://github.com/graphhopper/graphhopper/blob/master/web/src/main/java/com/graphhopper/http/GraphHopperApplication.java#L39

thpoiani · March 25, 2019, 7:49pm

@karussell, about the tags… Do you recommend to remove some unused tags from the dataset to improve the import time / routing algorithm run time?

karussell · March 25, 2019, 7:51pm

They shouldn’t matter. What matters is a fast disc like SSD and lot’s of RAM.

thpoiani · April 2, 2019, 10:53pm

Thanks for all the support karussell.

One more question related to the dataset:

Can you see performance improvements when it use smaller datasets?

Consider that scenario: I always run the Route API to calculate ETA on paths over a determined polygon (Sao Paulo). However, my dataset is the whole planet-osm.
Could the routing algorithm have a better response time on a trimmed dataset (Brazil-only instead of planet-osm)?

Make sense to me that a trimmed dataset requires less memory. But I would like to know about the performance (can I use the measurement action to calculate it?)

Thanks again

karussell · April 5, 2019, 4:03pm

Yes, unfortunately. But we are still investigating why, because this should not be, but probably there are not so big differences if you sort the graph (graph.sort=true).

can I use the measurement action to calculate it?

Yes. Measurement should give you insights into this

thpoiani · April 15, 2019, 5:23pm

Hello karussell.
Just to let you know I did load testings with smaller datasets.
I got some performance improvements on it.

However…
I run the performance tests on graphhopper 0.9 (my current production state) and graphhopper 0.12 (we will update to that version).
In my load testings using the same configurations and same data, the version 0.9 was faster than 0.12.

graphhopper 0.9 avg throughput = 121.4 req/sec
graphhopper 0.12 avg throughput = 96.32 req/sec

Do you have some similar result? Any idea why this difference?
Which version have the best performance on your tests?

Thanks

karussell · April 15, 2019, 5:47pm

Usually we improve on performance. But performance tests are really tedious to get right and also it is complex to improve all scenarios. So in order to know if this is really a problem you need to send us a reproducable measurement.

Preferable you do

./graphhopper.sh measurement area.pbf

and then you see in the resulting properties files what is going on and if the differences of both versions is reproducable.

thpoiani · April 16, 2019, 4:54pm

I did the measurement process and the results are basically the same for graphhopper 0.9 and 0.12

Could you take a look?

So, in theory, the response time should be the same?
I will double check my infrastructure to try to find some difference between the servers.

karussell · April 16, 2019, 5:52pm

Yes, the interesting variables are routingCH.mean (default speed mode), routingLM8.mean (hybrid mode) and routing.mean (flex mode).

But it could be that there is a regression with regards to the number of parallel volume as we currently do not test against this.

Can you try switching the servers? If the difference is still due to the versions then it would be interesting how you do the tests and how we can reproduce this.

thpoiani · April 17, 2019, 7:51pm

Hey karussell. New results:

I did another measurement test.
Now with the same config (new profile and another settings) that I had on the load testing.

graphhopper 0.9:

ARGS="config=$CONFIG graph.location=$GRAPH datareader.file=$OSM_FILE \
graph.flag_encoders=newcar prepare.ch.weightings=no prepare.lm.weightings=fastest prepare.min_network_size=200 prepare.min_one_way_network_size=200 routing.lm.disabling_allowed=true routing.non_ch.max_waypoint_distance=1000000 graph.dataaccess=RAM_STORE datareader.preferred_language=en"

graphhopper 0.12:

ARGS="$GH_WEB_OPTS graph.location=$GRAPH datareader.file=$OSM_FILE \
graph.flag_encoders=newcar prepare.ch.edge_based=off prepare.ch.weightings=no prepare.lm.weightings=fastest prepare.min_network_size=200 prepare.min_one_way_network_size=200 routing.lm.disabling_allowed=true routing.non_ch.max_waypoint_distance=1000000 graph.dataaccess=RAM_STORE datareader.preferred_language=en"

You can see the results on the same Google Sheet, on the tab encoder newcar.

|                 | GraphHopper 0.9 | GraphHopper 0.12   |
|-----------------|-----------------|--------------------|
| routingCH.mean  | -               | -                  |
| routingLM8.mean | 2.3982183808    | 2.0212344464000003 |
| routing.mean    | 8.167217608     | 6.772436764        |

Greater values = Better performance?

easbar · April 17, 2019, 8:13pm

No the values shown are average routing times in ms.

thpoiani · April 17, 2019, 8:40pm

Ok, that make sense
But it still weird… because graphhopper 0.12 was slower than 0.9 on my load testing

Additional info about the load testing:
JMeter --> 5 thread groups with the following configuration were started to get average values for response time:

Thread count: 50
Startup Time: 30 seconds
Hold Load: 10 minutes
Shutdown Time: 30 seconds

Same input data in every test.
Same networking and infrastructure (hardware) for both versions.

graphhopper 0.12 results:

| Throughput | Response time (min) | Response time (avg) | Response time (max) |
|------------|---------------------|---------------------|---------------------|
| 95.2       | 270                 | 501                 | 64501               |
| 97.6       | 266                 | 496                 | 4352                |
| 96.1       | 266                 | 496                 | 6155                |
| 96.2       | 270                 | 496                 | 4376                |
| 96.5       | 268                 | 494                 | 64060               |
| =96.32     | =268                | =496.6              | =28688.8            |

graphhopper 0.9 results:

| Throughput | Response time (min) | Response time (avg) | Response time (max) |
|------------|---------------------|---------------------|---------------------|
| 116.6      | 230                 | 409                 | 15227               |
| 121.8      | 266                 | 391                 | 3242                |
| 123.1      | 266                 | 387                 | 8615                |
| 122.5      | 267                 | 389                 | 4384                |
| 123        | 267                 | 388                 | 4027                |
| =121.4     | =259.2              | =392.8              | =7099               |

easbar · April 17, 2019, 8:44pm

Ok no idea really and better ask @karussell, but the Measurement class only tests the Java API (no server setup is involved, which could explain the difference).

thpoiani · April 22, 2019, 5:40pm

Hello @karussell. Can you give us a to let me know what could be happening? I really would like to use the new version, but that performance issue could be a problem for me.

Reshmi_Mukherjee · January 27, 2020, 2:08pm

hey @thpoiani did you get the reply from @karussell ? I am also facing the performance issue here with 0.12 v

karussell · January 27, 2020, 7:10pm

It could be everything. Between those versions we did not only change the algorithms, but also which roads are accepted and this is highly likely the reason. (e.g. you can try to exclude tracks and see that speed will be much faster, but tracks are allowed in some countries so we have to include them until country-rules are ready)