Ladies and Gentelmen!
I came here to tell you about re-use of public transport timetable data in my mobile application called Transportoid.
A few words about this app – it is being developed by two people in their spare time. I’m the project lead, and Piotr Owcarz is my teammate. The application covers 59 cities and agglomerations in Poland. It has been downloaded over two hundred thousand times, is available for Android and Windows Phone devices.
App is powered by timetable data – in most cases scraped by us from the websites of carriers and town councils. It is done automatically after each timetable update. In some cases – mostly bigger cities – we get data directly from owners.
Someone could ask if it’s legal to scrape timetables from a website. The indirect answer is yes, but as far as I know no such case has been tried in court. There are reasons to believe that a timetable itself is not covered by copyrights, and moreover, that a timetable – founded by the community and created by a public authority â€“ is subject to reuse pursuant to the act on access to public information. The only doubt could arise over the legal basis of the database protection act , but on the other hand there are regulations which obligate carriers to publish timetables and make them publicly accessible.
An important note – there has been no single case of anyone requesting the removal of any timetable covered in my application.
Scraping an online timetable is legal but not sufficient. Here is an example why it does not bring enough data to provide the connection searching feature. In this picture you can see the housing estate where I live, which is in the eastern Wrocław, Poland. The orange line is the route of one of the bus lines that terminate there.
But once per hour this bus terminates by the cemetery.
And twice a day it goes to the nearby village. Well, not nowadays, but it did in the recent past.
When we have only the departure times, we don’t know how long it takes to ride between two stops. In the situation pictured here, we can be almost sure it is one minute, but when the route splits or merges, we get more numbers on the left or right side. We usually have enough data to make a good guess, but again – data processing is not about guessing.
To provide a high level of service, we need source data from… well, it depends. Some cities delegate the whole public transport issue to an external entity, like a carrier, and just supervise the level of service. Other cities keep all authority and only subcontract driving the buses from point to point according to the given schedule. There are also many examples of intermediate solutions where it is not usually clear who prepares the timetables and who claims the ownership – every time it is a new inquiry for me.
Transportoid covers 55 cities in Poland and it makes no sense to tell you about all of them, but there are two cases worth mentioning.
The first case is Wrocław, my home city, which has been offering raw timetable data for about 10 years now. This is a perfect example of openness and willingness to help – two years ago I asked for bus stop coordinates and since then such a data set has been publicly available on the town council homepage. Their approach cannot be overrated, many thanks go to Wrocław!
The city of Kraków has adopted an exactly opposite approach, where MPK Kraków, the municipal carrier, refused my request to share the timetable data in a source format. They claim online timetables are enough but – as I have explained before – they do not contain the information about individual vehicle trips. Meanwhile, the company has shared the source data with my direct competitor, breaking the rules of fair play and equal treatment. In early February, I demanded the timetable source database re-use pursuant to the act on access to public information, which directly regulates the re-use of databases. MPK Kraków refused without a valid legal reason, so I brought a lawsuit against this carrier. The case is to be heard.
Fortunately, such an unfriendly and unfair approach is an exception, as carriers and authorities are usually quite friendly. As a rule, they want to sign an agreement which defines mutual rights and obligations; nothing oppressive so I conform to such requests. People working there understand the value of mobile timetables and provide help and assistance. Sometimes a city or a carrier suffers from an unfavourable agreement with their own IT service provider but we are making progress in these cases as well.
The whole process of litigation with carriers reveals a defect in the Polish Public Sector Information law acts. There is no tool to force publishing an up-to-date set of public information intended for automatic processing. I can request the re-use of timetable database, and the carrier may lawfully delay the delivery for 20 days, until the requested timetable is already outdated. This is important – without suitable regulations many areas of PSI will be effectively excluded from a broad re-use.
The second problem is common: the public information request process – with all the appeals, complaints and legal proceedings – may take a very long time.
My experiences of re-using the timetable data are quite positive. Transportoid is a hobby, the product of two guys working at night, yet we have been able to provide mobile timetables to hundreds of thousand of users living in more than fifty cities.
I believe the tendency towards opening public data will result in many more useful services.