as we’ve discussed before, I’ve started implementing tools to produce test data. The project is here:
So far I’ve implemented the Public Transport Enabler module in the first draft:
This module is unde GPLv3 due to PTE license, so don’t link to it/derive from it.
This tool takes a CSV file with queries (datetime/from/to), executes them agains the PTE provider (like bahn.de) and writes results to CSV and JSON files.
Assume this is a CSV file with a single query:
id,date_time,from_id,from_name,from_lon,from_lat,to_id,to_name,to_lon,to_lat c08b2cb1,2016-10-01T00:13:00,255606,"Blumenauer Weg",8.448225,49.555514,242306,Hombuschstraße,8.460743,49.516252
As a result you’ll get:
query_id,query_date_time,query_from_id,query_from_name,query_from_lon,query_from_lat,query_to_id,query_to_name,query_to_lon,query_to_lat,"trip_first_departure_date_time","trip_last_arrival_date_time",trip_legs_count c08b2cb1,2016-10-01T00:13:00,255606,"Blumenauer Weg",8.448225,49.555514,242306,Hombuschstraße,8.460743,49.516252,2016-10-01T09:20:00,2016-10-01T09:53:00,5
(You depart at 9:20, arrive at 9:53 and there are 5 legs in this trip.)
The tool uses GTFS file as basis, so the stops are taken from the
stops.txt of the GTFS file.
However, (at least in the case of RNV) the problem is that bahn.de apparently uses a little bit different data compared to what GTFS file provides. So to match GTFS data with PTE data we first need to generate a mapping between GTFS stops and PTE “locations”. The tool can handle it, you get the mapping file like:
stop_id,stop_name,stop_lon,stop_lat,location_type,location_id,location_lon,location_lat,location_place,location_name,location_products,distance 202935,Ebertpark/Fichtestr.,8.42054,49.4917,STATION,507997,8420906,49491721,"Ludwigshafen a","Friesenheim Ebertpark/Fichtestraße",TRAM;BUS,50
This matching is done based on the coordinates, the last field gived the distance range in meters (rounded to 50 meters).
Based on this mapping file the tool can generate a CSV file with queries - i.e. randomly selected from/to stops and timestamp from the given range.
Finally, having CSV files with stop/location mappings and queries, the tool executes the specified queries and writes results to CSV and JSON files. CSV contains basic data we’ll need for testing, JSON files are created one per trip to be able to follow, how exactly the query was rooted by bahn.de
Since bahn.de allows only a few (I think it’s 8) trip queries per minute, I had to implement a basic retry strategy with progression of pause. I.e. if PTE provider responds with SERVICE_DOWN we pause and retry and each time the pause is doubled (starting with 125ms). It is not so fast but seems to work well.
I’m now running ca. 1000 queries for the RNV dataset (ca. 50%/1h remaining), can provide it for testing later on.
I’d be grateful for feedback, like if we’ll need more data in the results.