Graphhopper Performance Issues

I have setup GraphHopper on an AWS m3.medium instance. I’ve setup a Ruby on Rails app with a sidekiq queue that is making 5 requests at a time. Response time begins <1s per job and then grows to 10s and then ~30s before finally reaching 600+s timeouts. At the beginning java on my server will show 99% CPU utilization as expected, but as time goes on and these longer requests linger (garnering timeouts on the client end) the server CPU drops to approx 15% to 20% CPU usage, which seems odd since the client is still waiting to hear back from it. Eventually I end up killing the graphhopper instance and restarting it and it works again until I see the performance drop off. Some cursory review of other threads indicates that graphhopper should be ok in this server configuration, but maybe I’m wrong? Does using the public transit instance make it less performant? Is there some kind of configuration I need to tune? Is 10 requests at a time too much? Any advice would be appreciated.

You could try the same setup without PT to see if the same results occur, but I would expect that this is not an issue of PT.

How much RAM did you give to GraphHopper. The m3.medium seems to have 3.75 GB of RAM. I would assume that your server ends up swapping, have a look at the swap partition when the requests become slow.

If it’s not swapping I would have a look at the garbage collection.

Cheers,
Robin

We are currently optimizing the public transit stuff as it was not yet suited for such scenarios. I think @michaz already commited major improvements … @mzagaja did you use 0.9 or a version from master? What exactly is your config.properties or if you use Java, what is the config code?

Yes. The current PT implementation uses a mostly different execution path from everything else. GraphHopper will behave vastly different with PT than without.

I used 0.9 but just tried to pull down the latest 0.10 snapshot and it is definitely different. Worse in some ways: all responses take 60+ seconds. CPU doesn’t appear to be maxing out but I think your hunch on RAM might be correct, it is definitely swapping. In the 0.10 snapshot I’m not seeing the response drop off, it remains steady in the world of 60-240s responses. Not great, but it doesn’t fall off the 300s cliff (at least not yet). I am using the default/example config.properties.

Yep, an empirical analysis indicates that using walking routing is much more performant than the public transit mode. I think the next step will be to get the public transit instance on a beefier server.

There seems to be something wrong. It shouldn’t take 60+ seconds, and yes, maybe you can try a server with >4GB RAM. Which GTFS file did you try?

I downloaded the MBTA_GTFS.zip file from http://www.mbta.com/rider_tools/developers/default.asp?id=21895. I am working on getting access to a virtual server with more RAM so I can test on that. Currently limiting requests to 5 concurrent transit direction requests at once. I also tried upping the RAM allocation of java to 7GB but it was just grabbing more from swap, so that could be playing a role in that issue. I will report back and let you know.

1 Like

For GraphHopper you should disable swap and try to avoid low RAM situations.

Ok so I now have a server with 8GB of RAM that I loaded GraphHopper into and am getting lots of calculations that take a long time with NoRouteErrors:

2017-09-19 14:43:26,374 [qtp1146825051-18] ERROR com.graphhopper.http.GHBaseServlet - elevation=false&locale=en-US&point=42.313017%2C-71.066111&point=42.355492%2C-71.048611&pt.earliest_departure_time=2017-09-01T09%3A00%3A00.000Z&type=json&vehicle=pt&weighting=fastest 10.10.30.32 en_US Faraday v0.11.0 [42.313017,-71.066111, 42.355492,-71.048611], took:33.873276, , fastest, pt, errors:[java.lang.RuntimeException: No route found]

A quick sanity check shows that the two points will route via Google.

So some more information. The failing routing input works fine on the latest version of Graphhopper, so it seems to be an error with stable. I have stood up Graphhopper on a 4 core machine and have been getting response times of approximately 15s a calculation. I then stood up Graphhopper on my local Mac Pro (6 cores) and seem to be hitting 5-10s a calculation hitting it with approximately 10 requests at a time. Some requests take as little as a second or less. So not super performant but at least it will be done this week.

This is great to hear. The public transit is under heavy development so this is also kind of expected :wink:

So not super performant

The public transit is not yet tuned for high speed, yes. This was also not the goal of the initial release where we more or less developed this from ground up within a few months.

I have a similar issue, however I use CH in my test case: in a test VPS cloud (8GB, 4 cores) a pretty complex test route takes <200ms to be calculated - which is good. There is no activity in the server other than myself.

However after one day timing suddenly goes to about 1.5/2secs and more. I disabled swap but that does not seem to help. RAM does not seem to be an issue:

~$ free
              total        used        free      shared  buff/cache   available
Mem:        8167876     3202564     4825148         564      140164     4758020
Swap:             0           0           0

Top command shows almost 0 % CPU usage when idle and it goes up as expected during calculation.

I know being in a VPS there can be other reasons why CPU performance can reduce, however, if I kill and restart graphhopper timing suddenly goes back to the <200ms so the issue does not seem related to the VPS CPU available.

Any clue on where I could look at?

Thanks!

Please create a new topic as public transit routing and plain road routing is different.