GraphHopperStorage,optimize cause ArrayIndexOutOfBoundsException

marq24 · July 26, 2018, 6:57pm

Hello,

I am using the gh 0.10 branch and I am not able to build my vehicle graph with world.pbf [smaler areas (like DACH) are fine]

java.lang.ArrayIndexOutOfBoundsException: -1262
    at com.graphhopper.storage.RAMIntDataAccess.setInt(RAMIntDataAccess.java:200)
    at com.graphhopper.storage.BaseGraph.initNodeRefs(BaseGraph.java:269)
    at com.graphhopper.storage.BaseGraph.inPlaceNodeRemove(BaseGraph.java:725)
    at com.graphhopper.storage.GraphHopperStorage.optimize(GraphHopperStorage.java:242)
    at com.graphhopper.routing.subnetwork.PrepareRoutingSubnetworks.doWork(PrepareRoutingSubnetworks.java:99)
    at com.graphhopper.GraphHopper.cleanUp(GraphHopper.java:1286)

I have running multiple different tests & setups in order to try to find the root cause of this issue - Running original gh 0.10 branch with stock ‘car’ FlagEncoder is finish graph creation without any issues.

So in general this issue seams to be ‘self made’ - I have a written a custom Car FlagEncoder - in order to support additional features & functionality within our ors project.

I have added additional logging information inside PrepareRoutingSubnetworks & BaseGraph in order to find some evidence what could be the root cause of my issue.

The PrepareRoutingSubnetworks.doWork() generates the following output:

start finding subnetworks (min:200, min one way:0) totalMB:112640, usedMB:70303
1510204 subnetworks found for car-ors, totalMB:112640, usedMB:75761
optimize to remove subnetworks (1510204), unvisited-dead-end-nodes (0), maxEdges/node (31)

in the GraphHopperStorage.optimize() I have added additional logging in order to get some numbers

getNodes(): 153648072
baseGraph.getCapacity(): 17364418560
((BitSet) removeNodesObject).length(): 153648072
removeNodesObject.getCardinality() / delNodes: 5015126

so then in the optimize() the inPlaceNodeRemove(delNodes) of the BaseGraph will be called - there I have added additional logging as well:

I was interested in the 'itemsToMove: object:
itemsToMove: 5015126 removeNodes.length: 153648072 getCardinality: 5015126

also I logged the ‘toMoveSet’:
toMoveSet.length: 153646465 getCardinality: 3968681

and then finally I logged the ‘nodeCount’ & ‘removeNodeCount’ shortly before the initNodeRefs(…) method will be called:
nodeCount: 153648072 / removeNodeCount: 5015126 / nodeEntryBytes: 20

then the final mthod in the stack trace will be called initNodeRefs(long oldCapacity, long newCapacity) where oldCapacity = (nodeCount-removeNodeCount) * nodeEntryBytes.

and looking at the numbers you might notice, that (153648072-5015126)*20 will give us a 2.972.658.920…

then in the initNodeRefs() itself a pointer will be generated - staring from oldCapacity + N_EDGE_REF

I have also included additional logging info in the initNodeRefs() - printing the loop count and the value of the pointer - and already in the first time the loop will be entered:
loopRun[0] pointer: -1322308376

so oldCapacity (in my run ‘2972658920’) + N_EDGE_REF will result in -1322308376 - which looks like an overflow - right?

My question for now is: -> why I am not running into a similar problem with stock gh code - or with other words - what is the graph size for car FlagEncoder?

Are my numbers (nodeCount: 153648072) extremely high? And if ‘yes’ would it possible for you to give me hint’s where/what to check in my FlagEncoder implementation - or more simpler question - can be my FlagEncoder be the root of this problem - or are there other/additional this that have an influence on the final nodeCount ?!

karussell · July 27, 2018, 6:54pm

We really have to make this customization process simpler… but a custom FlagEncoder should not cause these problems

It looks your FlagEncoder is not really polished enough and creates lots of islands and so 1/3 of all nodes are removed - is this expected?

Do you have a simple test case to reproduce this issue?

marq24 · July 28, 2018, 7:22am

Hello Peter,

if I would I would had a single clue what could be the root cause of the issue I would had tried to create a simpler test case - processing the complete world.pbf first in order to go through the optimization process is a pain in the… (each run takes here almost 2days) - we can check, what happens with the europe.pbf (since DACH was the largest we could test successfully - but this just would make “debugging” faster and does not solve the issue)…

Allow me to give you some additional details: The FlagEncoder that we are using right now for testing (now) is the original graphhopper car flag encoder… with the difference that in the ors project (which makes use of a custom gh-branch) we have modified the AbstractFlagEncoder.class - so it’s not ‘stock’ gh code.

The adjustments in the AbstractFlagEncoder are we have moved/copied the support for reverse speed (reverseSpeedEncoder) & considerElevation from the origin Bike2WeightFlagEncoder into the AbstractFlagEncoder class.

I am not sure, if you are willing to look into our modified code (but of course this is available via github (here is the link (just in case) https://github.com/GIScience/graphhopper/tree/ors_0.10.1-removed-unused-stuff-003))

As I said, we run into this indexOutOfBounds with our (more complex) CarFlagEncoder, but also with the stock gh car flag encoder (but since we altered the AbstractFlagEncoder class, the stock gh car flag encoder is not so stock any longer!)

What I don’t understand - or better what I want to know - even if it would be the case that 89% of the edges will be removed in the optimization process (in our case it’s “just” a 1/3) - this cleanup should not generate any index out of bounds - right? - we have a object of a certain size - then the code removed part inside our object and then at the end (at least that’s my understanding) you will overwrite the remaining ‘tail’ with zero’s (just to clean it up) - right?
The rest of the code is quite able to handle the overall size (BaseGraph) - and since I have added logging info in the BaseGraph.initNodeRefs() method I could see, that each time the graph have to be extended (even in the last run before the optimization) - the method is not generating a index out of bounds (no overflow?)…

That brings be to another question - *is this really an overflow?" - I’ll repeat my initial question: “so oldCapacity (in my run ‘2972658920’) + N_EDGE_REF will result in -1322308376 - which looks like an overflow - right?”

Regards
Matthias

karussell · July 31, 2018, 9:55pm

Please have a look if you use integers where long’s would be required or if you do something like:

nodeId*someIntValue

then

(long)nodeId*someIntValue

might be required. (Same applies for edgeId)

rabidllama · August 3, 2018, 8:57am

So digging in deeper and it seems the problem is coming at the

initNodeRefs((nodeCount - removeNodeCount) * nodeEntryBytes, nodeCount * nodeEntryBytes);

line in the BaseGraph.java file within the inPlaceNodeRemove method. There, because nodeEntryBytes is an int, it is converting the other parts (which are longs) to ints and so causing the overrun. It seems that line was added in 0.10, so is there a reason why it was introduced?

karussell · August 3, 2018, 9:18am

Ah, this was a bug but is fixed, maybe only in latest master?

rabidllama · August 3, 2018, 9:53am

Yup, just looked and the fix is applied on master branch but not in 0.10 branch

karussell · August 4, 2018, 11:59pm

Thanks, have cherry picked it and released 0.10.3