Skip to main content

February e-newsletter update out now

Research and Publications

SLEADE: Disagreement-Based Semi-Supervised Learning for Sparsely Labeled Evolving Data Streams

Gomes, H., Bifet, A., Pfahringer, B., Read, J & Grzenda, M. (2025)

Introduction

SLEADE is a semi‑supervised learning approach designed for evolving data streams with extremely sparse labels, where traditional supervised models struggle. It leverages ensemble disagreement and unsupervised drift detection to make effective use of unlabeled data and adapt to concept drift in real time.

Problems

  • Real‑world streams often provide very few labeled instances, making supervised learning unreliable

  • Data distributions change over time, requiring models to adapt continuously

  • Using pseudo‑labels can introduce noise; the challenge is to exploit them without degrading performance

  • Drift detection typically relies on labeled data, which is scarce in this setting

  • Disagreement must be meaningful to guide pseudo‑labeling and adaptation

Method

SLEADE uses an ensemble of classifiers that generate pseudo‑labels when they disagree, following a “majority trains the minority” strategy: if most models agree on a label, that pseudo‑label is used to train the minority models. A confidence‑based weighting function controls how strongly pseudo‑labeled instances influence learning. The system also incorporates unsupervised drift detection, enabling the ensemble to adapt to changes in the data distribution without requiring labeled feedback.

Key outcomes

  • Experiments on real and synthetic streams show SLEADE outperforming several existing semi‑supervised stream learners

  • Pseudo‑labeling with confidence weighting reduces noise and boosts accuracy

  • Unsupervised drift detection enables timely adaptation to evolving concepts

  • Demonstrated benefits across multiple domains and data types

Findings

The key finding is that the disagreement‑based semi‑supervised learning can successfully overcome the challenges of sparsely labeled evolving data streams. By combining ensemble disagreement, confidence‑weighted pseudo‑labeling, and unsupervised drift detection, SLEADE shows that unlabeled data (traditionally seen as a limitation) can be transformed into a powerful resource for maintaining accuracy and adaptability in dynamic environments.

How to cite this article

APA 7th: Gomes, H. M., Read, J., Bifet, A., Barddal, J. P., Enembreck, F., & Pfahringer, B. (2025). SLEADE: Disagreement‑based semi‑supervised learning for sparsely labeled evolving data streams. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050

MLA 9th: Gomes, Heitor M., et al. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering, 2025, https://doi.org/10.1109/TKDE.2025.3647050,

Chicago (Author-Date): Gomes, Heitor M., Jesse Read, Maciej Grzenda, Bernhard Pfahringer, and Albert Bifet. 2025. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050.