I am using a custom street centerline map from my customer, an ESRI Shape file, which I’ve successfully converted to OSM format and now am using it with Graphhopper. I am generally very pleased with the routes, so the data is pretty good. However today I found (by accident) an orphaned set of roads. Clearly one of the street segments has a start or end coordinate that is not quite the same as the start/end coordinate of the street segment it is supposed to intersect with.
I understand how to fix this in the data. What I don’t know is how I might automate a test to discover “island graphs”, a set of nodes/edges that are disconnected from the rest of the grid. I have not yet studied the Graphhopper implementation of the graph, so I’m not certain if there is some way to quickly discover “orphan” graphs within that structure.
My plan, barring some better idea, is to take a point on the very first way segment, and try routing to an endpoint on every other way segment in my database. I can see that if the routing algorithm cannot get to that exact point, it will route to a point as close as possible. So for every segment where the route ends at a point other than the point I asked for, that will indicate that “you can’t get there from here”, that this destination segment is off of the main graph. There are about 300,000 segments, (I’m not working with the whole world here), so it may run for a little while but I should get my answers.
I also realize that this will not find every error in the data. There can be an issue where two ways are not connected properly, but by taking a circuitous route, you can still arrive at your final destination.
Maybe a better thing to do is to find all ways that start/end within, say 3-5m from another way’s point. This is more a shape file thing - each segment in a shape file must touch an intersecting segment at its start or endpoint. So I would be testing just for start/end points that are extremely close to each other but not touching, and not all the points that make up the segment…
So I guess my question is - do these sound like reasonable ways to test the data? Or does anyone have a simpler way? Perhaps someone has automated such testing of roadway spatial data, and, being new to it, I’m simply not yet aware…