By Walter Weber, Founder & Product Director, netBI

Introduction
Public transport networks are the backbone of urban mobility. The design, reliability and efficiency of these networks can be supercharged by the effective application of high-quality data. However, raw data is often incomplete, inconsistent, and siloed across multiple data generating systems.
To unlock the true value of this data, public transit agencies and operators must first focus on completeness and quality. The process of data fusion can then yield far more powerful information and insights than each dataset does alone. These context-rich insights can enhance decision-making, optimise operations, and improve passenger experiences.
The Challenges of Raw Data
Data generated from public transport systems is inherently messy and lacks all the context required to be useful.
It comes from various source systems, including ticketing, real-time, telematics, passenger counting, scheduling, traffic sensors and other IoT devices, each with their own limitations.
Key challenges include:
- Siloed Data: Information is often fragmented across different systems, making it difficult to gain a holistic view.
- Lack of Context: Raw data points lack explanatory factors, making interpretation challenging.
- Incompleteness: Missing data can result from sensor failures, manual errors, or system outages.
- External Influences: Weather, events, and infrastructure conditions can introduce discrepancies that are not always captured in isolated datasets.

Understanding and Preparing Data
The first step to improving data usability is understanding how it is generated. This involves:
- Identifying Data Sources and Embedded Logic: Recognising whether data is influenced by human input, system logic, or external factors.
- Defining Data Standards: Establishing clear definitions to ensure consistency and comparability.
- Densification and Repair: Filling gaps, correcting errors, and normalising to improve completeness and accuracy.
- Curation and Governance: Clear presentation and implementing protocols to ensure ongoing data integrity and reliability.
The Power of Data Fusion
Fusing multiple datasets is where real value emerges. By integrating different data streams, transit operators can:
- Create Fault-Tolerant Data Architectures: Using multiple independent data sources, from systems that employ lossless data sharing mechanisms, help mitigate the potential for missing or erroneous information.
- Provide Corroborating Evidence: Cross-referencing and validating data from independent sources enhance accuracy and reliability.
- Enrich Contextual Understanding: A unified dataset enables deeper insights by providing the context behind raw figures.
For example, fusing GPS location data with scheduling and ticketing data allows for more accurate on time running and trip completion measurement.

Building Trust in Data
For data-driven decision-making to be effective, transit agencies and operators must have trust and confidence in their data. This means:
- Ensuring Auditability: All data should be traceable, with a clear lineage of how it was collected, processed, and transformed.
- Implementing Quality Assurance Measures: Regular validation and monitoring should be in place to detect anomalies and inconsistencies.
- Backing Insights with Evidence: Decision-makers should have access to transparent methodologies, software tooling and visualisations that support analytical processes and conclusions.
Unlocking Advanced Analytics and Applications
With high-quality, fused data, transit agencies and operators can leverage more advanced analytical models. There is an endless list of possibilities leveraging the same ‘enabling’ data foundation. This enables:
- Improved Network Planning: Identifying inefficiencies, inaccurate timetables, congestion points, service gaps and much more with greater accuracy.
- Predictive Analytics: Using historical data and machine learning to anticipate the future, such as seasonal/non-seasonal travel times, demand fluctuations, congestion, capacity constraints, and much more to empower proactive network planning and management.
- Real-Time Decision Support: Enabling near real-time interventions in response to operational disruptions and incidents.
- Enhanced Customer Experience: Providing passengers with more accurate arrival times, better service reliability, and personalised travel recommendations.

The Future: Real-Time and Granular Insights
Advancements in data-generating technologies are making transit analytics more precise and immediate. For example, location sensors and other IoT devices now capture and can share granular movement data in real-time, allowing for:
- Layover and Congestion Analysis: Identifying bottlenecks and optimising turnaround times.
- Real-Time Performance Monitoring: Detecting and responding to service delays dynamically.
- Adaptive Network Management: Using AI-driven insights to continuously refine scheduling and routing.

Harnessing Data Completeness, Quality, and Fusion with the netBI Data Intelligence Platform
Unlocking the full potential of public transport data requires more than just collecting raw information—it demands a platform that can ingest, process, warehouse, curate, and analyse vast volumes of data from varied systems. The netBI Data Intelligence Platform is designed to do exactly that, providing transport authorities, operators, planners, and decision-makers with a single, unified source of truth for optimising their networks.
netBI ingests and integrates a broad spectrum of public transport (PT) and associated datasets, including:
- Operational and Planning Data: Scheduling, route planning, and ticketing system data.
- Real-Time Passenger and Vehicle Data: IoT devices tracking passenger movements, vehicle locations, telematics, and on-board diagnostics (OBD).
- External Influences: Roads and traffic conditions, weather patterns, and satellite system data.
- Demographics and Customer Insights: Passenger demand trends, complaints, and service feedback.
- Financial and Resource Management Data: ERP, HCM, SCM, fuelling, maintenance, and infrastructure upkeep records.
- Additional Contextual Data: Any other relevant datasets that enhance the analytical output and provide actionable insights when fused into a centralised platform.
By ensuring a robust, fault-tolerant data architecture, and then fusing these diverse data sources, netBI transforms fragmented, siloed data into valuable information. This unlocks:
- Auditable, evidence-backed insights.
- Context-rich analytics.
- Advanced modelling and forecasting.
- Confident decision-making.
With data-generating technologies becoming more granular, reliable, and real-time, the netBI platform is built to support the next generation of transport analytics and operational software. Whether it’s reducing delays, optimising routes, improving passenger experiences, or enhancing financial efficiency, netBI ensures public transport networks are smarter, more adaptive, and data driven.
Conclusion
Data completeness, quality, and fusion are critical pillars in managing and improving public transport networks. By investing in technology that enables robust data management practices, transit agencies and operators can move beyond basic data visualisation to intelligent, predictive, and real-time analytics. This can not only enhance operational efficiency, but also elevate the overall passenger experience, paving the way for smarter and more responsive public transport systems.


Metlink Embraces Data Quality and Granularity to Unlock New Capabilities
netBI’s client, Greater Wellington Region Council, the steward of the Metlink public transport network in Wellington, has been embarking on a program to lift its data fitness in order to maximise efficiency and deliver enhancements to its customers.
In 2024, Metlink introduced a new on board announcement system to its bus network. This system not only provided high quality information to passengers on board each service, but it also introduced a very granular, lossless, independent source of real-time location data. netBI now ingests this data into its platform, which is unlocking additional areas of opportunity for Metlink including, but not limited to:
- High resolution trip completions and on time running results, lifting KPI accuracy and reducing manual investigations.
- Dwell cluster analysis to identify when buses are continuously stationary, and categorise the reasons why i.e. dwelling at a stop to pick up passengers, congestion, traffic light cycles, level-crossings, difficult turns etc.
- Timetable optimisation that uses machine learning to determine more accurate dwell times.
- More accurate bus emissions modelling, that takes into account elevation as well as both in-service trips and out-of service components such as pull-outs, deadheads, layovers and pull-ins.

Image 1: Location pings from real-time system of approximately every 9 seconds

Image 2: Location pings from onboard telematics system of approximately every 1 second
Download a PDF copy of this case study
Submit the form below to recieve a PDF download of this case study.