Finding orphaned ways (can't get there from anywhere)

I am using a custom street centerline map from my customer, an ESRI Shape file, which I’ve successfully converted to OSM format and now am using it with Graphhopper. I am generally very pleased with the routes, so the data is pretty good. However today I found (by accident) an orphaned set of roads. Clearly one of the street segments has a start or end coordinate that is not quite the same as the start/end coordinate of the street segment it is supposed to intersect with.

I understand how to fix this in the data. What I don’t know is how I might automate a test to discover “island graphs”, a set of nodes/edges that are disconnected from the rest of the grid. I have not yet studied the Graphhopper implementation of the graph, so I’m not certain if there is some way to quickly discover “orphan” graphs within that structure.

My plan, barring some better idea, is to take a point on the very first way segment, and try routing to an endpoint on every other way segment in my database. I can see that if the routing algorithm cannot get to that exact point, it will route to a point as close as possible. So for every segment where the route ends at a point other than the point I asked for, that will indicate that “you can’t get there from here”, that this destination segment is off of the main graph. There are about 300,000 segments, (I’m not working with the whole world here), so it may run for a little while but I should get my answers.

I also realize that this will not find every error in the data. There can be an issue where two ways are not connected properly, but by taking a circuitous route, you can still arrive at your final destination.

Maybe a better thing to do is to find all ways that start/end within, say 3-5m from another way’s point. This is more a shape file thing - each segment in a shape file must touch an intersecting segment at its start or endpoint. So I would be testing just for start/end points that are extremely close to each other but not touching, and not all the points that make up the segment…

So I guess my question is - do these sound like reasonable ways to test the data? Or does anyone have a simpler way? Perhaps someone has automated such testing of roadway spatial data, and, being new to it, I’m simply not yet aware…

Have a look at the class PrepareRoutingSubnetworks, which is already very fast (ie. optimal) to find such ‘islands’ and removes them if they are smaller than a certain number of nodes:


Also you can pick every point and fetch via LocationIndexTree the neighboring nodes and see if they are connected (via Graph).

Thanks, I will look into using these features to test the data. Meanwhile, it turns out I had an error in my conversion, and some Ways failed to get the ‘highway’ tag. They were not orphaned, they were missing completely from the graph. I’ve fixed my conversion program and will try again…

I’m trying to step through PrepareRoutingSubnetworks and figure out the “orphan” subnetworks.
The output when I just ran it through said:
PrepareRoutingSubnetworks - 312 subnetworks found for car
PrepareRoutingSubnetworks - optimize to remove subnetworks (312), unvisited-dead-end-nodes (0), maxEdges/node (6)

So I’m in the debugger, at the end of findSubnetworks(), and I have a list of integer arrays, 312 of them!
I’m struggling to figure out what is in each integer array - are these nodes? Edges? I’m trying to figure out how to get from one of these integers to a node id or way id in OSM (or at least the lat/long coordinates to see what these are. They will likely turn out to be nothing important, but I need to see what they are :slight_smile: Many are very small, one is very large, I haven’t looked at them all yet. If I figure out how to get from that list of integer arrays to OSM nodes/ways I’ll reply here…

We do not store OSM ids. In the array these are GH node IDs and you can fetch the geometry via graph.getNodeAccess().getLat/Lon(ghNodeId)

I saw there is a file called “names” which seems to have the names of all the edges (from the way’s that they were created with). I assume that is used when you provide the route with turn by turn directions. I was hoping the EdgeIterator might give me some way to access the name given the edgeID, but I haven’t had enough time to figure it out yet :slight_smile: Not sure if the names file stored in the network folder is documented somewhere, I haven’t found it yet. Alternatively, I did see your project at which may help.

BUT since you say these are nodes and not edges, I guess I’ll use that geometry to find out what way it is…

You can use graph.getEdgeIteratorState(edgeId, Integer.MIN_VALUE)

Not sure if the names file stored in the network folder is documented somewhere, I haven’t found it yet.

Internal structure are currently not well documented

So I didn’t realize how easy it was to see the name of the edge if I had an EdgeIterator already - there is the getName() method. So just to give myself some hints, I added the code below to see which edges are being removed, by name. Some have pretty unique names, I was able to simply find them that way… This is in
I may display the node coordinates this way as well, then remove these extra logs when I’m done.

int removeEdges(EdgeExplorer explorer, FlagEncoder encoder, TIntList component, int min) {
    int removedEdges = 0;
    if (component.size() < min)
        for (int i = 0; i < component.size(); i++) {
            EdgeIterator edge = explorer.setBaseNode(component.get(i));
            while ( {
      "removing Edge named (" + edge.getName() + ")" ); // ADDED THIS LINE
                edge.setFlags(encoder.setAccess(edge.getFlags(), false, false));