Skip to main content

February e-newsletter update out now

Research and Publications

SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams

Gomes, H. M., Read, J., Grzenda, M., Pfahringer, B., & Bifet, A. (2025). IEEE Transactions on Knowledge and Data Engineering.

Introduction

This publication presents SLEADE, a novel semi‑supervised learning framework designed for real‑world data streams where labels are extremely sparse. By combining disagreement‑based learning with unsupervised drift detection, SLEADE enables machine learning models to adapt effectively to evolving data distributions without relying on frequent labeled examples.

Problems

  • Many streaming environments (e.g., sensors, monitoring systems, online platforms) produce vast amounts of unlabeled data.

  • Traditional supervised models degrade quickly when concept drift occurs and labels are scarce.

  • Existing semi‑supervised methods often fail to adapt reliably in non‑stationary settings.

SLEADE directly targets this gap.

Method

SLEADE combines disagreement‑based semi‑supervised learning with unsupervised drift detection to handle evolving data streams with very few labels. The method maintains an ensemble of classifiers that identify informative unlabeled instances through their level of disagreement, while an unsupervised drift detector signals when the model should adapt to changes in the data distribution. This design allows SLEADE to learn continuously, efficiently, and robustly in dynamic environments.

Key outcomes

  • Introduces a new semi‑supervised ensemble method tailored for evolving data streams.

  • Demonstrates that disagreement‑based learning can significantly improve performance when labels are sparse.

  • Shows that unsupervised drift detection can maintain accuracy without requiring labeled data.

  • Outperforms several state‑of‑the‑art baselines in both synthetic and real‑world streaming benchmarks.

  • Provides evidence that semi‑supervised strategies are essential for practical, scalable streaming ML.

Experimental findings

Experiments show that SLEADE consistently outperforms existing supervised and semi‑supervised baselines in scenarios with severe label scarcity, maintaining strong accuracy across both synthetic and real‑world data streams. The method adapts effectively to abrupt and gradual concept drift, demonstrating stable performance and reduced reliance on labeled data, making it well‑suited for practical streaming applications.

Funding

This research was supported by TAIAO – Time‑Evolving Data Science / Artificial Intelligence for Advanced Open Environmental Science, funded by the New Zealand Ministry of Business, Innovation and Employment (MBIE).

How to cite this article

APA 7th: Bifet, Albert, Ricard Gavaldà, Geoff Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams: With Practical Examples in MOA. MIT Press, 2017.

MLA 9th: Gomes, Heitor M., et al. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering, 2025, https://doi.org/10.1109/TKDE.2025.3647050,

Chicago (Author-Date): Gomes, Heitor M., Jesse Read, Maciej Grzenda, Bernhard Pfahringer, and Albert Bifet. 2025. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050.