Process stuck at 100% - Java heap space OutOfMemory


I just experienced the following exception on my GH setup. The Server has some RAM left for requests and I handle CH and Non-CH requests. This seems to be related to

But I did not need to use pkill -9, a regular kill was enough.

Max visited nodes is currently set quite high:
routing.max_visited_nodes = 15000000

I haven’t tuned GC.

( ERROR: Java heap space, via:http://localhost:8989/route, java.lang.OutOfMemoryError: Java heap space at at 
gnu.trove.impl.hash.THash.postInsertHook( at at at 
com.graphhopper.routing.DijkstraBidirectionRef.fillEdges( at 
com.graphhopper.routing.DijkstraBidirectionRef.fillEdgesFrom( at 
com.graphhopper.routing.AbstractBidirAlgo.runAlgo( at 
com.graphhopper.routing.AbstractBidirAlgo.calcPath( at 
com.graphhopper.routing.AbstractRoutingAlgorithm.calcPaths( at 
com.graphhopper.routing.template.ViaRoutingTemplate.calcPaths( at 
com.graphhopper.GraphHopper.calcPaths( at 
com.graphhopper.GraphHopper.route( at 
com.graphhopper.http.GraphHopperServlet.doGet( at 
javax.servlet.http.HttpServlet.service( at 
javax.servlet.http.HttpServlet.service( at at at at at 
com.graphhopper.http.IPFilter.doFilter( at 
com.graphhopper.http.CORSFilter.doFilter( at at$ at$ at$ at at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter( at 
org.eclipse.jetty.servlet.ServletHandler.doHandle( at 

15mio nodes is indeed not small so multiple non-CH queries could indeed currently lead to this behaviour. If the last queries were non-CH queries, can you find out the rough distance?

To investigate this further (when this happens again if it is a GC or another issue) can you attach

-XX:+HeapDumpOnOutOfMemoryError -Xloggc:logs/your.log -XX:+PrintGCDateStamps

Be aware that HeapDumpOnOutOfMemoryError produces very big files, so enough disc space is required (at least size of your RAM but if multiple OOMs are thrown you need more).

To improve GC behaviour you can try a different GC -XX:+UseG1GC see the deployment guide

Thanks, I will reduce the max_visited_nodes and added the GC options. I will report if this happens again.

Yes, but there was nothing really special, I’d guess. I just checked the last 10 non-ch requests. All were below 600km (but with waypoints in between) and should have been finished at the time the exception occurred.

Please verify (via executing them again) that all of them are working. Maybe some query was stuck in a subnetwork?

I did now for the last 15 non-ch requests, and all executed well, but the system is not under high load right now.