OutOfMemoryError in France with public transports

Denis_Huvelle · October 25, 2022, 7:59pm

Hello!

I have a weird issue with GH, public transport and memory usage

I followed graphhopper/deploy.md at master · graphhopper/graphhopper · GitHub with vm.swappiness=0
I tried the default garbage collector, ZGC and Shenandoah
I tried to use an -XmX with a value close to my graph folder size + 2 GB and I also tried clearly bigger value
I checked all similar discussions here and in Github…

Each time, same result after a moment (depends on number of calls):

2022-10-25 10:00:02.140 [dw-285 - GET /route?point=49.0404143,2.031273&point=49.036322,2.078656&type=json&locale=en&elevation=false&profile=pt&pt.earliest_departure_time=2020-01-01T07%3A00%3A00.000Z&points_encoded=false&instructions=false&calc_points=false&alternative_route.max_paths=1] ERROR i.d.j.errors.LoggingExceptionMapper - Error handling a request: 87ff99237564283c
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Spliterators.iterator(Spliterators.java:696)
at com.graphhopper.gtfs.PtGraph.lambda$backEdgesAround$1(PtGraph.java:549)
at com.graphhopper.gtfs.PtGraph$$Lambda$503/0x00000008404d5c40.iterator(Unknown Source)
at com.graphhopper.gtfs.GraphExplorer$1.(GraphExplorer.java:91)
at com.graphhopper.gtfs.GraphExplorer.lambda$ptEdgeStream$6(GraphExplorer.java:89)
at com.graphhopper.gtfs.GraphExplorer$$Lambda$502/0x00000008404d6c40.iterator(Unknown Source)
at com.graphhopper.gtfs.GraphExplorer.lambda$exploreEdgesAround$0(GraphExplorer.java:70)
at com.graphhopper.gtfs.GraphExplorer$$Lambda$490/0x00000008404cf040.iterator(Unknown Source)
at com.graphhopper.gtfs.MultiCriteriaLabelSetting$MultiCriteriaLabelSettingSpliterator.tryAdvance(MultiCriteriaLabelSetting.java:103)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at com.graphhopper.gtfs.PtRouterImpl$RequestHandler.findPaths(PtRouterImpl.java:213)
at com.graphhopper.gtfs.PtRouterImpl$RequestHandler.route(PtRouterImpl.java:185)
at com.graphhopper.gtfs.PtRouterImpl.route(PtRouterImpl.java:77)
at com.graphhopper.resources.PtRouteResource.route(PtRouteResource.java:85)
at jdk.internal.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$398/0x000000084044fc40.invoke(Unknown Source)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)

I don’t think I have the same behaviour for other countries than France (I have one GH instance per country) and I can’t find the reason it always crashes with an OutOfMemoryError after a while…

It also seems to be related to PT routes requests because I tried to host twice the same config on 2 different machines and one of them was dedicated to car/bikes/foot routes and no memory exception after a while but the machine dedicated to PT got the exception after a few hours.

In case it’s useful, I put the graph folder, the config.yml and also the files used to create the graphs in a big zip file here: https://gtfsfixer.blob.core.windows.net/debug/Debug.zip

FYI, I launch the process with this cmd: nohup /usr/bin/java -Xmx20g -jar /home/graphhoppermatrixsplit/graphhopper-web-6.0-matrix.jar server /home/graphhoppermatrixsplit/config_fr.yml &> nohupfr.out &
(XmX20g for the moment as an example because I tried different sizes…)

I got around 1000 calls per hour on /route endpoint for PT profile.

Very interested to know if I failed somewhere in the config or even the way to launch it or if there is a memory leak somewhere (if it’s the case, it’s a very specific case because I don’t have the same situation with other countries than France).

Thanks!

easbar · October 25, 2022, 8:39pm

To me this looks like the search does not end for some reason and explores more and more nodes of the routing graph until the memory is exceeded. Is France the biggest country (in terms of nodes) you use? That could be the reason the error does not occur for the other countries. Do you have (some) very long running queries? And can you reproduce the problem for a single request or only when there are many parallel requests? If I was to debug this I would log the visited_nodes.sum parameter (should be available in the response of the routing requests): graphhopper/PtRouterImpl.java at 37380c01ac115b2fd11f4293b964eea3fde64c79 · graphhopper/graphhopper · GitHub and check if there are some requests where this value is very large. If you found one the next question of course would be why this happens. Could be a bug or some rared data constellation that is not handled properly.

Denis_Huvelle · October 26, 2022, 5:14am

Hello easbar,

Thank you for your quick response!

Indeed, France is the biggest (the 2 other main countries are Belgium and Luxembourg, clearly smaller!) but we only do calls for points at maximum 25 km (“as the crow flies”), no far far away points everywhere in the country…

I started to monitor the visited_nodes.sum and I already found different stats: most of the calls are under 100 ms but I can find different calls with long or very long time like these ones

9.5 seconds 1000000 visited_nodes.sum
1.3 seconds 237862 visited_nodes.sum
415ms 144743 visited_nodes.sum

One of the “normal” call:
85ms 1459 visited_nodes.sum

Should I limit the max visited nodes?

easbar · October 26, 2022, 5:56am

Ok, but the ‘as the crow flies’ distance isn’t so important here. If (for some reason) the search algorithm does not find the target it may happen that it simply explores lot and lots of nodes, even ones that are very far from both the start and destination points.

I would try to provoke the OutOfMemoryError by sending the request that visited 1000000 nodes (this is the maximum) many times in parallel. If that works this is possibly the cause of the problem. You could also try to set the max visited nodes to an even higher number to see if even more / all (?) nodes are visited. The next question would then be why this happens for this specific request.

And yes if all this is true you could reduce the max visited nodes to make the memory error less likely.

Denis_Huvelle · October 26, 2022, 1:58pm

Oh ok, good to know for the “as the crow flies” comparing to the search algo behavior!

To check with a heavy request but not the worst, I tried to do a call with a request I found with visited_nodes.sum = 483372 and every 2 or 3 calls on this one (I didn’t tried in parallel, just simple manual call from the browser) the memory consumption grows with ~300 MB! Then I tried the big one with the visited_nodes.sum = 1000000, the memory took 400 MB more just with one request.

I guess I will limit my max visited nodes and monitor if it’s stable after that, thanks

easbar · October 26, 2022, 2:18pm

Ok, it still might be interesting to see if any route can be found at all for the request with visited_nodes.sum=1000000 when you set the maximum to infinity (or anything larger than the absolute number of nodes like maybe 100mio for France). But also just the fact that for some requests visited_nodes.sum is very high, but finishes, like 483372, means that probably there are some edges with very large weight that have to be explored to find the route, but are only explored after all the others, because their weight is so high. And that could be something that needs to be or at least can be fixed.

Denis_Huvelle · October 27, 2022, 9:24am

I re-generated a brand new graph with

routing.max_visited_nodes: 100000

and relaunch the GH instance witth this config.yml

graphhopper:
datareader.file: fr.osm.pbf
gtfs.file: fr-1.gtfs.zip,fr-2.gtfs.zip,fr-3.gtfs.zip,fr-4.gtfs.zip,fr-5.gtfs.zip,fr-6.gtfs.zip
graph.location: graph-cache-all_fr
graph.flag_encoders: car,bike,foot

profiles:
- name: car
vehicle: car
weighting: fastest
- name: bike
vehicle: bike
weighting: shortest
- name: foot
vehicle: foot
weighting: shortest
- name: carnomotorways
vehicle: car
weighting: custom
custom_model_file: carnomotorways.yml

profiles_lm:

prepare.min_network_size: 0

routing.max_visited_nodes: 100000

routing.non_ch.max_waypoint_distance: 100000

graph.dataaccess: RAM_STORE

server:
application_connectors:

type: http
port: 8952
request_log:
appenders:
admin_connectors:

type: http
port: 8953
bind_host: localhost

logging:
appenders:

type: file
time_zone: UTC
current_log_filename: logs/graphhopper_fr_all.log
log_format: “%d{YYYY-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n”
archive: true
archived_log_filename_pattern: ./logs/all-graphhopper_fr-%d.log.gz
archived_file_count: 30
never_block: true

type: console
time_zone: UTC
log_format: “%d{YYYY-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n”

But the long route call is still returning

hints: {
visited_nodes.sum: 1000000,
visited_nodes.average: 1000000
}

My config.yml is using old properties? I created with previous versions of GH, maybe it’s not up to date anymore?

easbar · October 27, 2022, 8:43pm

You can check if the config syntax changed by comparing with the config-example.yml file of the version you are using.
However, in this case I think the problem is that unfortunately routing.max_visited_nodes does not affect the max visited nodes for the pt routing. If I’m not mistaken the value is even hard-coded at the moment and cannot be changed at all without re-compiling the code:

github.com

graphhopper/graphhopper/blob/c943724414c782a1520d09d1e6ea8802f51236c3/reader-gtfs/src/main/java/com/graphhopper/gtfs/Request.java#L31

      
        
            import com.graphhopper.util.Helper;
            import com.graphhopper.util.shapes.GHPoint;
            
            
import java.time.Duration;
            import java.time.Instant;
            import java.util.*;
            
            
public class Request {
                private List<GHLocation> points;
                private Instant earliestDepartureTime;
                private int maxVisitedNodes = 1_000_000;
                private boolean profileQuery;
                private Boolean ignoreTransfers;
                private double betaTransfers = 0.0;
                private double betaStreetTime = 1.0;
                private Integer limitSolutions;
                private boolean arriveBy;
                private double walkSpeedKmH = 5.0;
                private int blockedRouteTypes;
                private Map<Integer, Long> boardingPenaltiesByRouteType = new HashMap<>();
                private Locale locale = Helper.getLocale("en");

Denis_Huvelle · October 28, 2022, 6:54am

Oh, well spotted!

I recompiled it and deployed it again, I got this response for my heavy request:

{“hints”:{“visited_nodes.sum”:163146,“visited_nodes.average”:163146},“info”:{“copyrights”:[“GraphHopper”,“OpenStreetMap contributors”],“took”:455},“paths”:}

When I run it multiple times, I see a small memory increase but clearly smaller, now I will monitor the memory in production to see how it’s behave, thanks

easbar · October 31, 2022, 5:13pm

Did you also try setting the max visited nodes to something like 100mio (larger than the number of nodes for all of France)? It would be interesting to see if the request that stopped at 1mio before explores even more nodes, or even leads to OutOfMemory.

Denis_Huvelle · November 1, 2022, 8:48am

Not yet, still monitoring and fighting an other issue

For the moment, it seems to continue to grow in term of memory consumption but no heap size exception yet but my logs contain a lot of lines for routes where it cannot find connections (example: this very easy situation which is working fine for car profile but not for foot: http://145.239.232.32:8952/maps/?point=48.870886%2C2.305923&point=48.872188%2C2.300294&locale=fr&elevation=false&profile=foot&use_miles=false&layer=OpenStreetMap )
I tried to put prepare.min_network_size: 0 in my config.yml a few weeks ago because GH was not able to find routes on a small island in France so I am a little bit lost on the correct parameter to use to not break “simple” cases like this one in a city like Paris

easbar · November 1, 2022, 9:11am

(example: this very easy situation which is working fine for car profile but not for foot:

I just checked this. The reason there is no route for foot is that the road seems to be tagged like highway=other. You can see this when you choose ‘Local MVT’ from the layers menu and zoom close to the red marker. When you hover the road it says ‘road_class: other’ which is not routeable. The reason it works for car is that in this case the red marker snaps to the nearby primary road instead.

So the question here is why it snaps to the non-routeable road for the foot profile and this could actually be a bug. Can you share the exact GH version and map file you used for this?

I tried to put prepare.min_network_size: 0 in my config.yml a few weeks ago because GH was not able to find routes on a small island in France so I am a little bit lost on the correct parameter to use to not break “simple” cases like this one in a city like Paris

When you set min_network_size: 0 you will be at risk that small isolated road network ‘islands’ will be kept. To prevent cases where no route can be found you need to use a larger value.

Denis_Huvelle · November 2, 2022, 4:18am

I am using latest 6.0 jar release with very recent France OSM file from Geofabrik: I tried to re-create my graph with 50 instead of 0 and now the route can be found for foot profile

I also tried to build a version with gtfs/Request.java private int maxVisitedNodes = 100_000_000;

and this time it finishes at this sum:

hints:{
- visited_nodes.sum: 1544368,
- visited_nodes.average: 1544368},

Denis_Huvelle · January 2, 2023, 1:40pm

Hello easbar,

I took time to optimize some stuff in our side (add caching on calls to GH, etc. to reduce the calls number, etc.) and I also tried to log all the requests before the issue happens (the main symptoms are all cores very busy suddenly) and I found some calls on /route which were taking ~37 seconds and here is the hints:

“hints”: {
“visited_nodes.sum”: 13552931,
“visited_nodes.average”: 13552931
}

And this is happening with a version built with private int maxVisitedNodes = 300_000;
into graphhopper-master\reader-gtfs\src\main\java\com\graphhopper\gtfs\Request.java

Do we have another place to put a max value? It seems this value is correctly used for most of the requests because almost all of them are under 300,000 or exactly on this number except a few of them which are running crazy

sninkovic · June 21, 2023, 10:21am

I am experiencing similar. I created instance with GraphHopperGtfs and provided gtfs files for NL and BE. Although this instance has very light traffic, latencies for normal routing queries (without PT) were much greater than with GraphHopper instance without PT. Eventually, after couple of day GraphHopperGtfs instance got completely stuck (probably due to memory leak):

[2023-06-21T12:16:24.433+0200][331456.174s][info][gc] GC(19725) Pause Full (G1 Evacuation Pause) 120198M->120182M(122880M) 76591.607ms
[2023-06-21T12:16:24.433+0200][331456.174s][info][gc] GC(19726) Concurrent Cycle 76591.861ms
[2023-06-21T12:16:24.440+0200][331456.181s][info][gc] GC(19727) To-space exhausted
[2023-06-21T12:16:24.440+0200][331456.181s][info][gc] GC(19727) Pause Young (Normal) (G1 Evacuation Pause) 120198M->120198M(122880M) 5.500ms

I am using graphhopper-web-bundle 4.0