Research and Publications

Streaming Isolation Forest

Liu, J., Liu, F., Bifet, A., Pfahringer, B., & Cassales, G. (2025)

Introduction

Streaming Isolation Forest (SIF) is an online anomaly‑detection method designed for evolving data streams, where data arrives continuously and patterns change over time. It extends the classic Isolation Forest algorithm, originally built for static datasets, so it can operate efficiently, adapt to drift, and detect anomalies in real time without storing historical data.

Problems

The original algorithm requires full‑dataset access and cannot update incrementally
Real‑world streams change over time, so models must adapt continuously
Storing past data is impossible in high‑velocity streams
Anomaly detection must work without labeled feedback
Online updates must preserve the randomness and structure that make Isolation Forest effective

Method

The Streaming Isolation Forest method works by updating isolation trees incrementally as new data arrives, rather than rebuilding the entire forest from scratch. Each incoming instance is used to partially refresh or replace parts of the existing trees, allowing the model to adapt continuously to changes in the data stream. To handle concept drift, the algorithm incorporates mechanisms such as sliding windows or decay functions so that older, less relevant data gradually loses influence. Only small samples of data are used during updates, keeping computation fast and memory usage low. As the forest evolves, anomaly scores are recalculated in a way that reflects the current structure of the trees, ensuring that the system can detect unusual or rare patterns in real time without storing historical data.

Key outcomes

SIF detects anomalies quickly with low latency
The model adapts to changing data distributions
No need to store historical data
Performs competitively against other streaming anomaly‑detection methods
Efficient enough for real‑world, high‑throughput applications

Findings

The core finding is that Isolation Forest can be successfully adapted to streaming environments by using incremental tree updates and adaptive scoring. Streaming Isolation Forest maintains the strengths of the original algorithm while enabling fast, memory‑efficient, and drift‑aware anomaly detection in evolving data streams.

Read here

Journal Publications

RMIDDM: an unsupervised and interpretable concept drift detection method for data streams
Linear adaptive filtering for regression in data streams
Accelerated Weka: GPU Machine Learning with Weka Workbench
Automatic species identification from images for Aotearoa
A comparative study of four deep learning algorithms for predicting tree stem radius measured by dendrometer: A case study
Bayesian Stream Tuner: Dynamic Hyperparameter Optimization for Real-Time Data Streams

Conference Publications

Featured Publications

How to cite this article

APA 7th: Liu, J. J., Cassales, G. W., Liu, F. T., Pfahringer, B., & Bifet, A. (2025). Streaming Isolation Forest. In X. Wu et al. (Eds.), Advances in Knowledge Discovery and Data Mining (PAKDD 2025, Lecture Notes in Computer Science, Vol. 15870). Springer. https://doi.org/10.1007/978-981-96-8170-9_8

MLA 9th: Liu, Justin Jia, et al. “Streaming Isolation Forest.” Advances in Knowledge Discovery and Data Mining, edited by Xindong Wu et al., vol. 15870, Lecture Notes in Computer Science, Springer, 2025. https://doi.org/10.1007/978-981-96-8170-9_8.

Chicago (Author-Date): Liu, Justin Jia, G. W. Cassales, F. T. Liu, Bernhard Pfahringer, and Albert Bifet. 2025. “Streaming Isolation Forest.” In Advances in Knowledge Discovery and Data Mining, edited by Xindong Wu et al., Lecture Notes in Computer Science, vol. 15870. Singapore: Springer. https://doi.org/10.1007/978-981-96-8170-9_8.

The University of Waikato
University of Canterbury
The University of Auckland
Victoria University of Wellington
MetService
Beca