OSM data import performance: PBF vs OSM

I thought i’d share a quick experiment regarding PBF performance vs OSM. Its overly simple, and maybe the results are obvious, but here it is. Its a relatively small map, and running on 0.5 branch:

Load PBF:

2016-03-13 14:43:13,280 [main] INFO  com.graphhopper.GraphHopper - version 0.5.3.1|2016-03-13T19:23:00+0000 (4,12,3,2,2,1)
2016-03-13 14:43:13,298 [main] INFO  com.graphhopper.GraphHopper - graph car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|,,,,, details:edges:0(0MB), nodes:0(0MB), name:(0MB), geo:0(0MB), bounds:1.7976931348623157E308,-1.7976931348623157E308,1.7976931348623157E308,-1.7976931348623157E308
2016-03-13 14:43:13,317 [main] INFO  com.graphhopper.GraphHopper - start creating graph from chicagoland-obc.pbf
2016-03-13 14:43:13,318 [main] INFO  com.graphhopper.GraphHopper - using car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|,,,,, memory:totalMB:245, usedMB:26
2016-03-13 14:43:40,929 [main] INFO  com.graphhopper.reader.OSMReader - creating graph. Found nodes (pillar+tower):1 002 672, totalMB:2309, usedMB:1102
2016-03-13 14:43:50,842 [main] INFO  com.graphhopper.reader.OSMReader - 7 475 652, now parsing ways
2016-03-13 14:43:57,100 [main] INFO  com.graphhopper.reader.OSMReader - 8 540 415, now parsing relations
2016-03-13 14:43:57,103 [main] INFO  com.graphhopper.reader.OSMReader - finished way processing. nodes: 317600, osmIdMap.size:1002857, osmIdMap:11MB, nodeFlagsMap.size:230, relFlagsMap.size:337, zeroCounter:188 totalMB:2370, usedMB:1464
2016-03-13 14:43:57,104 [main] INFO  com.graphhopper.reader.OSMReader - time(pass1): 27 pass2: 16 total:43
2016-03-13 14:43:57,109 [main] INFO  com.graphhopper.GraphHopper - start finding subnetworks, totalMB:2370, usedMB:1464
2016-03-13 14:43:57,668 [main] INFO  com.graphhopper.routing.util.PrepareRoutingSubnetworks - optimize to remove subnetworks (25363), unvisited-dead-end-nodes (0), maxEdges/node (8)
2016-03-13 14:43:57,720 [main] INFO  com.graphhopper.GraphHopper - edges: 464933, nodes 313137, there were 25363 subnetworks. removed them => 4463 less nodes
2016-03-13 14:43:58,159 [main] INFO  com.graphhopper.storage.index.LocationIndexTree - location index created in 0.4363265s, size:364 712, leafs:31 943, precision:300, depth:4, checksum:313137, entries:[64, 64, 4, 4], entriesPerLeaf:11.417587
2016-03-13 14:43:58,159 [main] INFO  com.graphhopper.GraphHopper - flushing graph car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|4,12,3,2,2, details:edges:464 933(16MB), nodes:313 137(4MB), name:(1MB), geo:1 186 053(5MB), bounds:-88.94159955400823,-87.31720475071381,41.29509571109181,42.154270421436514, totalMB:2370, usedMB:1628)
2016-03-13 14:43:58,287 [main] INFO  com.graphhopper.http.DefaultModule - loaded graph at:D:\temp\chicagoland-obc-pbf-gh, source:chicagoland-obc.pbf, flagEncoders:car,foot,bestparking, class:edges:464 933(16MB), nodes:313 137(4MB), name:(1MB), geo:1 186 053(5MB), bounds:-88.94159955400823,-87.31720475071381,41.29509571109181,42.154270421436514
2016-03-13 14:43:58,332 [main] INFO  com.graphhopper.http.DefaultModule - jsonp disabled
2016-03-13 14:43:58,892 [main] WARN  org.eclipse.jetty.servlets.GzipFilter - GzipFilter is deprecated. Use GzipHandler
2016-03-13 14:43:58,901 [main] INFO  com.graphhopper.http.GHBaseServlet - com.graphhopper.http.TrafficServlet Initialized
2016-03-13 14:43:58,901 [main] INFO  com.graphhopper.http.GHBaseServlet - ParkingServlet Initialized
2016-03-13 14:43:59,135 [main] INFO  com.graphhopper.http.GHServer - Started server at HTTP :8989`

Result: 46 seconds from launch to server up.

For OSM:

2016-03-13 14:46:01,640 [main] INFO  com.graphhopper.GraphHopper - version 0.5.3.1|2016-03-13T19:23:00+0000 (4,12,3,2,2,1)
2016-03-13 14:46:01,651 [main] INFO  com.graphhopper.GraphHopper - graph car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|,,,,, details:edges:0(0MB), nodes:0(0MB), name:(0MB), geo:0(0MB), bounds:1.7976931348623157E308,-1.7976931348623157E308,1.7976931348623157E308,-1.7976931348623157E308
2016-03-13 14:46:01,669 [main] INFO  com.graphhopper.GraphHopper - start creating graph from chicagoland-obc.osm
2016-03-13 14:46:01,669 [main] INFO  com.graphhopper.GraphHopper - using car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|,,,,, memory:totalMB:245, usedMB:26
2016-03-13 14:46:34,124 [main] INFO  com.graphhopper.reader.OSMReader - creating graph. Found nodes (pillar+tower):1 002 672, totalMB:218, usedMB:73
2016-03-13 14:46:54,712 [main] INFO  com.graphhopper.reader.OSMReader - 7 475 652, now parsing ways
2016-03-13 14:47:07,969 [main] INFO  com.graphhopper.reader.OSMReader - 8 540 415, now parsing relations
2016-03-13 14:47:08,034 [main] INFO  com.graphhopper.reader.OSMReader - finished way processing. nodes: 317600, osmIdMap.size:1002857, osmIdMap:11MB, nodeFlagsMap.size:230, relFlagsMap.size:337, zeroCounter:188 totalMB:202, usedMB:99
2016-03-13 14:47:08,035 [main] INFO  com.graphhopper.reader.OSMReader - time(pass1): 32 pass2: 33 total:66
2016-03-13 14:47:08,038 [main] INFO  com.graphhopper.GraphHopper - start finding subnetworks, totalMB:202, usedMB:99
2016-03-13 14:47:08,605 [main] INFO  com.graphhopper.routing.util.PrepareRoutingSubnetworks - optimize to remove subnetworks (25363), unvisited-dead-end-nodes (0), maxEdges/node (8)
2016-03-13 14:47:08,659 [main] INFO  com.graphhopper.GraphHopper - edges: 464933, nodes 313137, there were 25363 subnetworks. removed them => 4463 less nodes
2016-03-13 14:47:09,090 [main] INFO  com.graphhopper.storage.index.LocationIndexTree - location index created in 0.42731392s, size:364 712, leafs:31 943, precision:300, depth:4, checksum:313137, entries:[64, 64, 4, 4], entriesPerLeaf:11.417587
2016-03-13 14:47:09,091 [main] INFO  com.graphhopper.GraphHopper - flushing graph car,foot,bestparking|RAM_STORE|2D|delayAndTrafficCost|4,12,3,2,2, details:edges:464 933(16MB), nodes:313 137(4MB), name:(1MB), geo:1 186 053(5MB), bounds:-88.94159955400823,-87.31720475071381,41.29509571109181,42.154270421436514, totalMB:209, usedMB:86)
2016-03-13 14:47:09,261 [main] INFO  com.graphhopper.http.DefaultModule - loaded graph at:D:\temp\chicagoland-obc-gh, source:chicagoland-obc.osm, flagEncoders:car,foot,bestparking, class:edges:464 933(16MB), nodes:313 137(4MB), name:(1MB), geo:1 186 053(5MB), bounds:-88.94159955400823,-87.31720475071381,41.29509571109181,42.154270421436514
2016-03-13 14:47:09,305 [main] INFO  com.graphhopper.http.DefaultModule - jsonp disabled
2016-03-13 14:47:09,767 [main] WARN  org.eclipse.jetty.servlets.GzipFilter - GzipFilter is deprecated. Use GzipHandler
2016-03-13 14:47:09,923 [main] INFO  com.graphhopper.http.GHServer - Started server at HTTP :8989

Result: 1 min 4 sec

The one other point to consider: OSM uses way less memory than PBF during the finding subnetworks.
My assumption is that this would scale linearly - meaning a 30% time savings if processing PBF, however ram usage may make it difficult. Maybe others can share their experiences.

Thanks for sharing. With OSM you mean the XML format I guess? I think the higher RAM usage and faster import comes from the fact that we parallelize reading for PBF

(BTW: minor edit to title, as loading != importing)

Yes, exactly .OSM = XML
Peter, can you clarify what the two terms mean in the Graphhopper context? Terminology is very important to clear communication.

Do you mean loading vs. import? import is the whole process of reading OSM, preparation and feeding the graph (takes minutes to hours). Whereas loading a graph is reading just the already imported&prepared graphhopper storage files from disc or elsewhere (usually takes ms to seconds).