machine learning

Explaining Anomalies with Isolation Forest and SHAP

September 30, 2024

Isolation Forest is an unsupervised, tree-based anomaly detection method. See how both KernelSHAP and TreeSHAP can be used to explain its output.

Isolation Forest has become a staple in anomaly detection systems [1]. Its advantage is being able to find complex anomalies in large datasets with many features. However, when it comes to explaining those anomalies, this advantage quickly becomes a weakness.

To take action on an anomaly we often have to understand the reasons for it being classified as one. This insight is particularly valuable in real-world applications, such as fraud detection, where knowing the reason behind an anomaly is often as important as detecting it.

Unfortunately, with Isolation Forest, these explanations are hidden within the complex model structure. To uncover them, we turn to SHAP.

We will apply SHAP to IsolationForest and interpret its output. We will see that although this is an unsupervised model we can still use SHAP to explain its anomaly scores. That is to understand:

How features have contributed to the scores of individual instances
and which features are important in general.