Blog

A traffic data researcher’s perspective: how can you achieve confidence in your data set?

October 24, 2022

A traffic data researcher’s perspective: how can you achieve confidence in your data set?

October 24, 2022

We spoke with Associate Professor Ashish Bhaskar — a leading academic focused on transport big data and traffic management research — about the importance of accurate data, what impacts the reliability of data, and why the future of transport depends on overcoming the challenges of integrating data for a network-wide view.

Dr Bhaskar has a PhD in Intelligent Transport Systems, a Masters in Transport Engineering and a bachelor’s degree in civil engineering. As well as teaching Civil Engineering as an Associate Professor at QUT, he collaborates with a team of researchers to find evidence-based decisions support, management, and control of multi-modal transport systems. He currently co-leads the Business and Engineering systems domain for the QUT Centre for Data Science and is the co-chair for the World Conference on Transport Research Society (WCTRS) SIG-C3 on Intelligent Transport Systems (ITS).

“Huge volumes of data are available but they cannot be used blindly.”

Data volumes are growing and many organisations are investing heavily in infrastructure that makes it possible to store and access this data as needed to inform a variety of analytics and data science projects.

Dr Bhaskar has partnered with a number of government and industry organisations to develop his research and argues that accuracy and reliability of data — with respect to the use case in mind — are critical aspects to consider in the data space.

“I mean, no matter how big the data is, if it is not accurate enough for the application it’s useless,” he said.

For example, the reliability of data often depends on the sample sizes behind it. A small sample of vehicle speed data might be good enough to understand speed profile during uncongested conditions, but during congested conditions traffic has more variability and larger sample is needed to have confidence in the estimates. Similarly, information from only certain segments of a road will not tell you accurately what’s happening across the whole spatial scale.

Dr Bhaskar said organisations that wanted to develop effective evidence-based decision support tools from their data could not ignore limitations in the availability of the data sets being used, the science behind it, and external factors that affect comparisons and integrations.

“We must address the confidence we have in the data set and how calibrated the data source is for our application.”

What impacts the reliability and accuracy of your data?

Determining how confident you can be in your data will also depend on the source of your data set and how ITS technologies are deployed.

Dr Bhaskar gave an example of using Bluetooth data, such as the data collected across around 2,000 scanners located across Brisbane City’s road network, which scan the unique MAC ID (media access control address) of electronic devices used within their vicinity:

“Scanners placed at signal intersections scan the MAC address of Bluetooth devices. Because those MAC addresses are unique, you can trace the movement of a device — such as mapping the MAC IDs and their detection time between the upstream and downstream of the link to get its travel time. And from a mesh of scanners we can get the travel speeds and congestion profiles over the whole city.”

But there are a number of factors that affect the accuracy of this approach:

“The closer the scanners are placed relative to each other, the less accurate you’ll be in the travel time (speed) estimates because of issues with the scanner’s zone of coverage, which might be 100m. If you have two scanners on two intersections that are 300 metres apart, you’re already making a spatial error of let’s say 100m upstream and downstream. So your travel time and travel speed would not be as reliable.”

Conversely, placing scanners too far apart could reduce the sample size of individual vehicle travel time estimates, and therefore reduce respective confidence in the estimate of the population average:

“The moment scanners are 1-2km apart, you might find when observing arterial roads that there is less traffic upstream compared to downstream because vehicles might be moving in and moving out of the area between the two scanners. So your sample size decreases and that will impact on your accuracy and the confidence you might have in your average travel time estimation.”

A clear understanding of how data is gathered can help you understand where potential errors might originate, and how much trust you can place in data, which will then influence the accuracy and reliability of your modelling or analytical activities.

Data fusion is the key to data confidence

Dr Bhaskar shared two examples of how integrating data from multiple sources is important for organisations who want more reliable data insights:

A more complete picture improves confidence:

An organisation seeking to understand traffic speeds might be getting speed data from vehicle detection loops installed in the ground at traffic lights, as well as Bluetooth, telematics and other third-party systems. Different sources might be giving you similar information for a certain indicator, but not every source may be reliable. Fusion and statistical analysis allows you to gain more certainty despite varying levels of confidence in different sources, to achieve a single point of truth that road operators can use.

Fusion helps you interpret insights better:

Fusion of heterogeneous sources is important to understand the reasons behind the patterns you observe in a more nuanced way so you can decide the best course of action. An organisation might have speed profiles from Bluetooth, incident records from an ITS system that identifies the location of accidents, and weather information from the Bureau of Meteorology that shows the amount of rainfall in a certain area over a time period. Fusion of these sources helps an organisation get a very clear understanding of patterns observed – for example, outlier accident conditions that involve very low speeds, which might be overlooked with a blinkered focus.

He said the future of transport management hinged on integrating data, to help governments and operators work together to deliver a cohesive multimodal network. He said multimodal visibility and control would support a transition to autonomous vehicles and mobility as a service (MaaS) by making life simpler for travellers who want to seamlessly switch between multiple modes of transport.

“The challenge is how do we utilise data sets from across different jurisdictions and modes of transport, which can be very different both spatially and temporally, to come up with better models that help control the whole network — that’s where I think the future is.”

If you would like help with understanding the level of data quality, accuracy and context across your data sets, consider working with netBI. netBI specialise in contextualising multiple, disparate data sets for reliable and in-depth analysis of transport networks. Learn more about our team and purpose or contact us for more information on how we can assist.

About the author

Dr Ashish Bhaskar

Dr Bhaskar has a PhD in Intelligent Transport Systems, a Masters in Transport Engineering and a bachelor’s degree in civil engineering. As well as teaching Civil Engineering as an Associate Professor at QUT, he collaborates with a team of researchers to find evidence-based decisions support, management, and control of multi-modal transport systems.

We’re a team of IT professionals, data science experts and consultants, lead by a management team with an excellent mix of technical and commercial knowledge.

Acknowledgement
netBI acknowledges and pays respect to the past, present and future Traditional Custodians and Elders of this nation and the continuation of cultural, spiritual, and educational practices of Aboriginal and Torres Strait Islander peoples.