[solved] Saint Petersburg GTFS (smaller than Berlin's) needs 40 GB of heap memory. Why?

The documentation example Berlin & Brandenburg GTFS compiles perfectly even on a laptop with default parameters (Xmx8g = 8GB heap space).
Now, I tried Saint Petersburg, a city of about the same 5M+ population.
Here’s the feed. https://transitfeeds.com/p/saint-petersburg/826
I cut OSM.pbf to contain only Saint Petersburg, so it’s only 32Mb.

Config yml:

    graphhopper:
  datareader.file: /data/spb.osm.pbf  # 32Mb, russia from Geofabrik sliced by the city boundary with keep ways & keep relations.
  gtfs.file: /data/spb-lenobl.zip  # https://transitfeeds.com/p/saint-petersburg/826, latest (1 july)
  graph.location: /graphs/spb-transit
  graph.flag_encoders: foot

server:
  application_connectors:
    - type: http
      port: 8989
  admin_connectors:
    - type: http
      port: 8990

I tried running on 12, 24, 32 GB heap space, and it failed on the inter-network connections stage.

What does make this particular GTFS feed so very heavy for Graphhopper?

Comparing the feed files:

Berlin http://transitfeeds.com/p/verkehrsverbund-berlin-brandenburg
61 Mb compressed, 448 Mb uncompressed. stop_times.txt 289 Mb.

Saint Petersburg http://transitfeeds.com/p/saint-petersburg/826
35 Mb compressed, 282 Mb uncompressed, stop_times.txt 234 Mb.

I thought it must have been stop_times, but it’s not.

Last time it failed at this stage:

INFO  [2020-07-07 11:51:25,368] com.graphhopper.gtfs.GraphHopperGtfs: Looking for inter-feed transfers
web_1  | java.lang.OutOfMemoryError: Java heap space
web_1  | 	at java.base/java.util.Arrays.copyOf(Unknown Source)
web_1  | 	at com.carrotsearch.hppc.IntArrayList.ensureBufferSpace(IntArrayList.java:351)
web_1  | 	at com.carrotsearch.hppc.IntArrayList.insert(IntArrayList.java:173)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$SortedIntSet.addOnce(LocationIndexTree.java:799)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemLeafEntry.addNode(LocationIndexTree.java:766)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:901)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:917)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex$1.set(LocationIndexTree.java:887)
web_1  | 	at com.graphhopper.storage.index.BresenhamLine$1.set(BresenhamLine.java:79)
web_1  | 	at com.graphhopper.storage.index.BresenhamLine.bresenham(BresenhamLine.java:45)
web_1  | 	at com.graphhopper.storage.index.BresenhamLine.calcPoints(BresenhamLine.java:75)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.addNode(LocationIndexTree.java:892)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree$InMemConstructionIndex.prepare(LocationIndexTree.java:870)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree.getPrepareInMemIndex(LocationIndexTree.java:224)
web_1  | 	at com.graphhopper.storage.index.LocationIndexTree.prepareIndex(LocationIndexTree.java:290)
web_1  | 	at com.graphhopper.gtfs.GraphHopperGtfs.importPublicTransit(GraphHopperGtfs.java:193)
web_1  | 	at com.graphhopper.GraphHopper.postProcessing(GraphHopper.java:934)
web_1  | 	at com.graphhopper.GraphHopper.process(GraphHopper.java:662)
web_1  | 	at com.graphhopper.GraphHopper.importOrLoad(GraphHopper.java:625)
web_1  | 	at com.graphhopper.http.GraphHopperManaged.start(GraphHopperManaged.java:125)
web_1  | 	at io.dropwizard.lifecycle.JettyManaged.doStart(JettyManaged.java:27)
web_1  | 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
web_1  | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
web_1  | 	at org.eclipse.jetty.server.Server.start(Server.java:407)
web_1  | 	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
web_1  | 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:97)
web_1  | 	at org.eclipse.jetty.server.Server.doStart(Server.java:371)
web_1 exited with code 1

[addition] Managed to compile on Xmx40g.

INFO  [2020-07-07 13:28:12,590] com.graphhopper.gtfs.GraphHopperGtfs: flushed graph totalMB:40960, usedMB:31562)

~40GB of RAM used when running it.

Solved the issue by fixing date intervals (some start dates were 2020, while end dates were 2019-12-31)
and emptying the frequencies.txt file.
19GB RAM used, took 3 minutes to compile.

1 Like

After an experiment, I can for sure tell that frequencies.txt was causing the problem. In Berlin’s GTFS it’s empty.

Powered by Discourse