Research and Publications

SLEADE: Disagreement-Based Semi-Supervised Learning for Sparsely Labeled Evolving Data Streams

Gomes, H., Bifet, A., Pfahringer, B., Read, J & Grzenda, M. (2025)

Introduction

SLEADE is a semi‑supervised learning approach designed for evolving data streams with extremely sparse labels, where traditional supervised models struggle. It leverages ensemble disagreement and unsupervised drift detection to make effective use of unlabeled data and adapt to concept drift in real time.

Problems

Real‑world streams often provide very few labeled instances, making supervised learning unreliable
Data distributions change over time, requiring models to adapt continuously
Using pseudo‑labels can introduce noise; the challenge is to exploit them without degrading performance
Drift detection typically relies on labeled data, which is scarce in this setting
Disagreement must be meaningful to guide pseudo‑labeling and adaptation

Method

SLEADE uses an ensemble of classifiers that generate pseudo‑labels when they disagree, following a “majority trains the minority” strategy: if most models agree on a label, that pseudo‑label is used to train the minority models. A confidence‑based weighting function controls how strongly pseudo‑labeled instances influence learning. The system also incorporates unsupervised drift detection, enabling the ensemble to adapt to changes in the data distribution without requiring labeled feedback.

Key outcomes

Experiments on real and synthetic streams show SLEADE outperforming several existing semi‑supervised stream learners
Pseudo‑labeling with confidence weighting reduces noise and boosts accuracy
Unsupervised drift detection enables timely adaptation to evolving concepts
Demonstrated benefits across multiple domains and data types

Findings

The key finding is that the disagreement‑based semi‑supervised learning can successfully overcome the challenges of sparsely labeled evolving data streams. By combining ensemble disagreement, confidence‑weighted pseudo‑labeling, and unsupervised drift detection, SLEADE shows that unlabeled data (traditionally seen as a limitation) can be transformed into a powerful resource for maintaining accuracy and adaptability in dynamic environments.

Read here

Journal Publications

RMIDDM: an unsupervised and interpretable concept drift detection method for data streams
Linear adaptive filtering for regression in data streams
Accelerated Weka: GPU Machine Learning with Weka Workbench
Automatic species identification from images for Aotearoa
A comparative study of four deep learning algorithms for predicting tree stem radius measured by dendrometer: A case study
Bayesian Stream Tuner: Dynamic Hyperparameter Optimization for Real-Time Data Streams

Conference Publications

Featured Publications

How to cite this article

APA 7th: Gomes, H. M., Read, J., Bifet, A., Barddal, J. P., Enembreck, F., & Pfahringer, B. (2025). SLEADE: Disagreement‑based semi‑supervised learning for sparsely labeled evolving data streams. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050

MLA 9th: Gomes, Heitor M., et al. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering, 2025, https://doi.org/10.1109/TKDE.2025.3647050,

Chicago (Author-Date): Gomes, Heitor M., Jesse Read, Maciej Grzenda, Bernhard Pfahringer, and Albert Bifet. 2025. “SLEADE: Disagreement‑Based Semi‑Supervised Learning for Sparsely Labeled Evolving Data Streams.” IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2025.3647050.

The University of Waikato
University of Canterbury
The University of Auckland
Victoria University of Wellington
MetService
Beca