Public transport and the hard to catch advanced analytics.

You have your data securely in one place. You have managed to draw some conclusions from it. But so far, you’ve done no more than could be done one hundred years ago ­– although much faster and with more data points. What truly novel insights can modern technology provide? It is time to discuss the ins and outs of advanced analytics. That, and two bears on a fishing trip. Read on.

The crystal ball that can work

One example of how advanced analytics can deliver business value is to predict the future in some capacity. For instance, Stratiteq provides our public transport customers with occupancy predictions for vehicles at specific stops at certain times of the day. The business value is twofold. Firstly, it enables better traffic and maintenance planning. Secondly, you can use it to influence travelers. You present the public with occupancy predictions for a searched journey in an app or on signs at the stations. Travelers can then decide to move their errands away from peak hours, thus smoothing the use of the vehicles.

The input defines the output

The process of predicting occupancy for vehicles is very dependent on the nature of available data.

  • The most basic approach is to aggregate the mean value of occupancy as the train leaves a particular stop.
  • A slightly better way is to separate weekends from non-weekends as traveling behavior differs. There is, of course, nothing advanced about that approach. You still only know the occupancy as the vehicle leaves the stop.
  • In many public transport areas, there is no passenger counting except for the passengers (hopefully) validating their tickets when they get on the bus. You don’t know where they get off, making it impossible to see the occupancy at a certain stop.

Moreover, travelers waiting along the route don’t care how full the vehicle is right now. They only care about how full it is when it reaches their stop. But of course, it’s good to get a heads up.

Ticket-searching data in real-time

We used ticket-searching data for a customer and created a model where we believed passengers would step off the vehicle based on the ticket searches. Real-world sample checking proved that this model was quite close to reality. If many people search for A->B and fewer people search for A->C, it turns out that the proportions between those searches match real-world behavior quite well.

  • Armed with the knowledge of where people are likely to get off, we implemented a model that updates in real-time.
  • The model assumes an average number of boarding passengers and uses the statistical prediction of where they will get off.
  • For ongoing trips, it then uses real-time ticket validations to adjust the model after the current circumstances.

This method works reasonably well. It runs into problems when real-world behavior strays too far from statistical behavior.

Double sensors

For another customer, we have access to sensors that count passengers boarding and passengers getting off the vehicle. This access is good for two reasons.

  • The most obvious one is that we don’t need to make assumptions about passenger behavior based on ticket searches.
  • There is a bonus, however: covariance between stations. Different people want to go to different places. Passengers from station A might want to go to station E more than they want to go to station B or vice versa.
  • With passenger counting, we can find the covariance between boarding passengers at one stop and the occupancy at another stop. So we know that if extra passengers step on the train at station A, they are unlikely to get off at station B, affecting the whole trip.

(A further step would be to use wi-fi data to track individual passengers and do away with the covariance as well: then we would have a correlation between stops which is even more potent than covariance.

Insights, tools, and two bears on a fishing trip

The clear-sighted reader notes that so far, we mention no cutting-edge technologies such as deep learning, time series analysis, or neural networks. That’s because while these tools are meaningful and can generate value, the true insight always lies in the data.

  • The insight is number one. Its quality, but also the nature of its contents. And the knowledge of where to go and find it.
  • The choice of tools we apply to garner insights is number two. But the duo works well as a team.

Think of them, if you like, as two bears. One knows where all the salmon are but is not so great at fishing. The other is great at fishing but has no clue as to where the salmon are hiding. Both go hungry. Your business will benefit the most from insights that carry the most meaning for your market offering. And you will need to use the most appropriate tools to gather those insights.

My colleague Mario can tell you more about how we at Stratiteq can help you get the maximum number of salmon out of your data. I wish you all the luck!