Research and Publications
Practical Machine Learning for Streaming Data
Gomes, H. M., & Bifet, A. (2024)
Introduction
This paper provides a practical introduction to machine learning for streaming data, outlining the core challenges, algorithms, and tools needed to build models that learn continuously from fast, evolving data streams.
Problems addressed
Traditional ML cannot handle fast, continuous, unbounded data streams.
Batch learning fails when data evolves and concept drift occurs.
Models need to learn incrementally with limited memory and compute.
Practitioners lack clear guidance on deploying real‑time, adaptive ML methods.
Methods overview
The paper surveys practical approaches for streaming machine learning, focusing on incremental algorithms, drift‑aware methods, and efficient data structures that support real‑time learning. It highlights how models can be updated instance‑by‑instance, how to detect and adapt to concept drift, and how open‑source tools such as scikit‑multiflow enable practitioners to deploy streaming solutions at scale.
Key outcomes
Provides a practitioner‑focused overview of incremental and adaptive learning algorithms for data streams.
Explains how to handle concept drift, limited memory, and real‑time constraints.
Demonstrates practical workflows using open‑source streaming ML frameworks.
Offers guidance for deploying streaming models in real‑world systems.
Serves as an accessible entry point for engineers and researchers working with evolving data.
Findings
The results demonstrated that incremental and drift‑aware learning methods consistently outperform traditional batch models when applied to fast, evolving data streams. The results show that streaming algorithms maintain accuracy under concept drift, operate efficiently with limited memory, and adapt quickly to changes in the data distribution. Practical evaluations using open‑source tools highlight their scalability and suitability for real‑time applications, reinforcing the need for adaptive, online approaches in modern data environments.
Peer-reviewed Publications
Conference Publications
How to cite this article
APA 7th: Gomes, H. M., & Bifet, A. (2024). Practical machine learning for streaming data. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). https://doi.org/10.1145/3637528.3671442
MLA 9th: Gomes, Heitor M., and Albert Bifet. “Practical Machine Learning for Streaming Data.” Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), 2024, https://doi.org/10.1145/3637528.3671442.
Chicago (Author-Date): Gomes, Heitor M., and Albert Bifet. 2024. “Practical Machine Learning for Streaming Data.” In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). https://doi.org/10.1145/3637528.3671442
The University of Waikato
University of Canterbury
The University of Auckland
Victoria University of Wellington
MetService
Beca