Research and Publications
SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams
Gomes, H. M., Read, J., Grzenda, M., Pfahringer, B., & Bifet, A. (2025). IEEE Transactions on Knowledge and Data Engineering.
Introduction
This publication presents SLEADE, a novel semi‑supervised learning framework designed for real‑world data streams where labels are extremely sparse. By combining disagreement‑based learning with unsupervised drift detection, SLEADE enables machine learning models to adapt effectively to evolving data distributions without relying on frequent labeled examples.
Problems
Many streaming environments (e.g., sensors, monitoring systems, online platforms) produce vast amounts of unlabeled data.
Traditional supervised models degrade quickly when concept drift occurs and labels are scarce.
Existing semi‑supervised methods often fail to adapt reliably in non‑stationary settings.
SLEADE directly targets this gap.
Method
SLEADE combines disagreement‑based semi‑supervised learning with unsupervised drift detection to handle evolving data streams with very few labels. The method maintains an ensemble of classifiers that identify informative unlabeled instances through their level of disagreement, while an unsupervised drift detector signals when the model should adapt to changes in the data distribution. This design allows SLEADE to learn continuously, efficiently, and robustly in dynamic environments.
Key outcomes
Introduces a new semi‑supervised ensemble method tailored for evolving data streams.
Demonstrates that disagreement‑based learning can significantly improve performance when labels are sparse.
Shows that unsupervised drift detection can maintain accuracy without requiring labeled data.
Outperforms several state‑of‑the‑art baselines in both synthetic and real‑world streaming benchmarks.
Provides evidence that semi‑supervised strategies are essential for practical, scalable streaming ML.
Experimental findings
Experiments show that SLEADE consistently outperforms existing supervised and semi‑supervised baselines in scenarios with severe label scarcity, maintaining strong accuracy across both synthetic and real‑world data streams. The method adapts effectively to abrupt and gradual concept drift, demonstrating stable performance and reduced reliance on labeled data, making it well‑suited for practical streaming applications.
Funding
This research was supported by TAIAO – Time‑Evolving Data Science / Artificial Intelligence for Advanced Open Environmental Science, funded by the New Zealand Ministry of Business, Innovation and Employment (MBIE).
How to cite this article
APA 7th: Bifet, Albert, Ricard Gavaldà, Geoff Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams: With Practical Examples in MOA. MIT Press, 2017.
MLA 9th: Gomes, Heitor M., et al. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering, 2025, https://doi.org/10.1109/TKDE.2025.3647050,
Chicago (Author-Date): Gomes, Heitor M., Jesse Read, Maciej Grzenda, Bernhard Pfahringer, and Albert Bifet. 2025. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050.
The University of Waikato
University of Canterbury
The University of Auckland
Victoria University of Wellington
MetService
Beca