Finding the right XAI method — a guide for the evaluation and ranking of explainable AI methods in climate science

Download

[thumbnail of aies-AIES-D-23-0074.1.pdf]

Preview

Text - Accepted Version
· Please see our End User Agreement before downloading. | Preview

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Bommer, P. L., Kretschmer, M. ORCID: https://orcid.org/0000-0002-2756-9526, Hedström, A., Bareeva, D. and Höhne, M. M.-C. (2024) Finding the right XAI method — a guide for the evaluation and ranking of explainable AI methods in climate science. Artificial Intelligence for the Earth Systems, 3 (3). ISSN 2769-7525 doi: 10.1175/aies-d-23-0074.1

Abstract/Summary

Explainable artificial intelligence (XAI) methods shed light on the predictions of machine learning algorithms. Several different approaches exist and have already been applied in climate science. However, usually missing ground truth explanations complicate their evaluation and comparison, subsequently impeding the choice of the XAI method. Therefore, in this work, we introduce XAI evaluation in the climate context and discuss different desired explanation properties, namely robustness, faithfulness, randomization, complexity, and localization. To this end, we chose previous work as a case study where the decade of annual-mean temperature maps is predicted. After training both a multi-layer perceptron (MLP) and a convolutional neural network (CNN), multiple XAI methods are applied and their skill scores in reference to a random uniform explanation are calculated for each property. Independent of the network, we find that XAI methods Integrated Gradients, layer-wise relevance propagation, and input times gradients exhibit considerable robustness, faithfulness, and complexity while sacrificing randomization performance. Sensitivity methods – gradient, SmoothGrad, NoiseGrad, and FusionGrad, match the robustness skill but sacrifice faithfulness and complexity for randomization skill. We find architecture-dependent performance differences regarding robustness, complexity and localization skills of different XAI methods, highlighting the necessity for research task-specific evaluation. Overall, our work offers an overview of different evaluation properties in the climate science context and shows how to compare and benchmark different explanation methods, assessing their suitability based on strengths and weaknesses, for the specific research problem at hand. By that, we aim to support climate researchers in the selection of a suitable XAI method.

Altmetric Badge

Item Type	Article
URI	https://reading-clone.eprints-hosting.org/id/eprint/115958
Identification Number/DOI	10.1175/aies-d-23-0074.1
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Meteorology
Publisher	American Meteorological Society
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Deposit Details

Date Deposited:	25 Apr 2024 16:52	Date item deposited into CentAUR
Last Modified:	25 Mar 2025 03:00	Date item last modified

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar