Suitability for map matching "fuzzy" (cellular) data?

kodonnell · September 7, 2016, 10:02pm

Our use case is that we’re looking at cellular data. In some cases we have a somewhat exact (trilaterated) coordinates, but in other cases we only know the approximate location (specificed by the radial distance from a tower at a known location). I was planning to use Graphhopper to do this manually, until I came upon the map matching, which might suit (or already do this, with a few tweaks). I’m guessing what I’d like to be able to do is, in the situation where we have the approximate location, consider only possible edges which are on this radius. (From what I’ve read, the map-matching finds the closest edge(s?) to a point, so if I replace that with all of the edges a correct radius away, it should just work?)

It’s also further complicated by the fact that there may be many possible edges at the right radius, and I’d need to choose the “best” route across them (or record all possible routes). In addition, we may want to hard-code extra logic (e.g. try to get routes which match the timestamps in the cellular data, and include possible alternate routes). With that in mind, it may be best to manually implement this in Graphhopper.

PS - I’m not very familiar with graphhopper or the map matching.

karussell · September 7, 2016, 10:44pm

Yes, map matching should help you here significantly.

kodonnell · September 7, 2016, 10:56pm

Thanks. It seems like the main thing I need to do is tweak the “candidates” per input point, e.g. here. I will play with that, and afterward can work on e.g. including alternate routes in between match coordinates (if this isn’t already supported out of the box).

kodonnell · September 8, 2016, 3:40am

I’m afraid it looks like the current implementation won’t be suitable - the cellular data is too sparse, and I get “sequence is broken” errors (for points that are e.g. 100km apart).

@karussell, does the above surprise you? If so, I can help to debug/fix. Otherwise, do you know of any other alternatives (already supported in map matching, or some peudo-code)? I’m looking for a map-matching operation on this (sparse) cellular data which

is fast (we’ve got a lot of data)
can use the fuzzy data (as above)
doesn’t have to be super accurate, as we’re aggregating. (That is, it’s OK if we can’t pick the exact route that’s used, but it’s not OK if e.g. we get crazily different routes, especially from systematic errors.)

If needed, I suspect I can start with what you have on your original map matching blog, and tweak it accordingly.

karussell · September 8, 2016, 8:28am

You can increase the maximum allowed nodes … but do you really want to judge over an unknown route of over 100km?

can use the fuzzy data (as above)

I do not understand what the big difference is between your ‘fuzzy’ data and normal fuzzy data coming from GPS devices. Just has a bigger error (?)

I suspect I can start with what you have on your original map matching blog

The current solution will in mostly all situations give you better results, but of course you can try the old method too.

kodonnell · September 8, 2016, 8:36pm

Thanks, I didn’t realise that. To get it running I had to increase the maximum allowed nodes to 100000. Unfortunately, it now takes over 2 seconds to run. Maybe that’s not too bad given that standalone routing takes about 0.1s (on a specified route - interestingly CH is not the fastest), and this is an unusually long route. I’ll have to performance test (and see whether anyone has any ideas about Spark).

We are aware of the risks etc., hence why this will only ever be used for detection of aggregate traffic flows, with a bunch of error detection on top. We’re not aiming for 100% accuracy - but even 50% is infinitely better than not using the data at all (providing no systematic errors).

Not exactly. It’s more that in some cases we only know that the mobile was on a radius e.g. ‘10km from X’. (The equivalent GPS fuzziness would be “at some radius R from X, with standard deviation of d” - here we already know R.) There’s inherent fuzziness on top of that (i.e. it’s actually a fuzzy radius), as well as in the points that are trilaterated … probably of a hundred meters or so.

Agreed it’ll be better - as long as it runs fast enough. I’ll do some more testing, but off the top of your head, how much faster do you think the map matching could be made? And likelihood of this happening? (I could probably help, with some direction.)

karussell · September 8, 2016, 8:44pm

Not sure what you mean here. We have not yet implement switching to CH for the map matching: support CH for matching · Issue #60 · graphhopper/map-matching · GitHub

see whether anyone has any ideas about Spark

No experience with Spark, but you can just call importOrLoad on normal machine and use the created folder there?

but off the top of your head, how much faster do you think the map matching could be made

For your usecase probably with enabling CH there is a huge boost possible, at least factor 10 I guess. Other tunings are possible, have not yet investigate this for the new version. Also you can try the old version (also no CH support) but could be faster than the current one without CH

kodonnell · September 8, 2016, 8:54pm

I used map matching for three points, and it took 2s. If I just use graphhopper to do A → B and B → C, it only took 0.1s (across a few algorithms - and CH wasn’t the fastest).

Sounds good. I’ll do some more testing for our use case. If it looks feasible, then I’d be happy to give the CH implementation a crack (though I can’t promise I’ll be able to help, as I’m pretty new to Java).

karussell · September 8, 2016, 11:04pm

Again, the reason it took just 0.1s is that you used CH and for map matching CH is not yet enabled by default (as it normally makes no big difference for smaller distances). And I’m still not sure what you mean with ‘CH wasn’t the fastest’. Did you disable CH?

kodonnell · September 8, 2016, 11:15pm

I disabled CH and tried other algorithms, and they seemed to sometimes be faster, for my sample size of one. However, I didn’t reimport the data each time, so maybe that’s why.

karussell · September 9, 2016, 7:18am

You can easily compare them, no need for reimport. Try ch.disable=true|false still CH will be always faster