Explainable Machine Learning

Overview

Semester	Winter 2022
Course type	Block Seminar
Lecturer	TT.-Prof. Dr. Wressnegger
Audience	Informatik Master & Bachelor
Credits	4 ECTS
Room	148, Building 50.34
Language	English or German
Link	https://campus.kit.edu/campus/all/event.asp?gguid=0xB5C7C25A3A7C4464A36349B86FFDDA7B
Registration	https://ilias.studium.kit.edu/goto.php?target=crs_1922847&client_id=produktiv

Description

This seminar is concerned with explainable machine learning in computer security. Learning-based systems often are difficult to interpret, and their decisions are opaque to practitioners. This lack of transparency is a considerable problem in computer security, as black-box learning systems are hard to audit and protect from attacks.

The module introduces students to the emerging field of explainable machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics cover different aspects of the explainability of machine learning methods for the application in computer security in particular.

Schedule

Date	Step
Tue, 25. Oct, 11:30–13:00	Primer on academic writing, assignment of topics
Thu, 3. Nov	Arrange appointment with assistant
Mon, 7. Nov - Fri, 11. Nov	1st individual meeting (First overview, ToC)
Mo, 5. Dec - Fri, 9. Dec	2nd individual meeting (Feedback on first draft of the report)
Thu, 22. Dec	Submit final paper
Mon, 9. Jan	Submit review for fellow students
Thu, 12. Jan	End of discussion phase
Fri, 13. Jan	Notification about paper acceptance/rejection
Fri, 27. Jan	Submit camera-ready version of your paper
Fri, 17. Feb	Presentation at final colloquium

Mailing List

News about the seminar, potential updates to the schedule, and additional material are distributed using a separate mailing list. Moreover, the list enables students to discuss topics of the seminar.

You can subscribe here.

Topics

Every student may choose one of the following topics. For each of these, we additionally provide two recent top-tier publications that you should use as a starting point for your own research. For the seminar and your final report, you should not merely summarize that paper, but try to go beyond and arrive at your own conclusions.

Moreover, all of these papers come with open-source implementations. Play around with these and include the lessons learned in your report.

Propagation based Explanations
Propagation based explanations are generated by backpropagating relevance values from a network's output to the input. Therefore a variety of propagation-rules have been proposed. The topic should also consider the individual properties satisfied by the different rules.
- Sundararajan, Taly, and Yan, "Axiomatic Attribution for Deep Networks.", ICML 2017
- Montavon et al., "Layer-Wise Relevance Propagation: An Overview.", Book: "Explainable AI: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning" 2019
Generating Counterfactual/Contrastive Explanations
For human being,a good explanation is always contrastive. People do not only ask ‘Why A?’; they ask ‘Why A rather than B?’. Towards human-centered AI, the research field of explainable AI has gone deeper to counterfactual and contrastive explanation, which focus on this alternative scenario B and especially how it can be generated.
- Stepin et al., "A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence", IEEE Access 2021
- Prashan et al., "Explainable reinforcement learning through a causal lens.", AAAI 2020
Concept based Explanations
Concept based explanation methods try to explain a model's decision in human-understandable concepts, instead of feature importance values. This topic covers the identification of concepts and visualization of the associated explanations.
- Amirata et al., "Towards automatic concept-based explanations."", NeurIPS 2019
- Yeh Chih-Kuan et al., "On completeness-aware concept-based explanations in deep neural networks.", NeurIPS 2020
- Achtibat Reduan, et al. "From" Where" to" What": Towards Human-Understandable Explanations through Concept Relevance Propagation." arXiv:2206.03208 (2022).
Interactive Explanations
Explanations between humans naturally are interactive. In an interactive dialog a human points to the part s/he wants to understand better. This process is guiding the explanation method.
- Wexler et al., "The What-If Tool: Interactive Probing of Machine Learning Models", IEEE TVCG 2020
- Sokol et al., "Glass-box: Explaining AI decisions with counterfactual statements through conversation with a voice-enabled virtual assistant.", IJCAI 2018
Measuring the Quality of Explanations
A variety of explanation methods has be proposed in the last year. But which one is the best method for a given task? Clarifying this question is part of this topic.
- Adebayo et al., "Sanity Checks for Saliency Maps.", NeurIPS 2018
- Arras, Osman and Samek, "CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations", Information Fusion 2022
Using User Studies to evaluate Explanations
This topic is spezifically considering user studies as a way to evaluate the quality of explanations. The report should consider the limitations, problems, and results of such an evaluation.
- Hendricks et al., "Grounding Visual Explanations”, ECCV 2018
- Lucic, Haned, and de Rijke, "Why Does My Model Fail?: Contrastive Local Explanations for Retail Forecasting.”, FAT 2020
Information Leakage through Explanation
State-of-the-art explanation methods often cause leakage of sensitive information about the model's parameters as well as it's training data. The additional information obtained through explanations can be abused by adversaries.. How can we prevent information leakage through explanations while preserving the explanation quality?
- Milli, et al., "Model reconstruction from model explanations.", FAT 2019
- Shokri et al., "On the Privacy Risks of Model Explanations.", AIES 2021
Attacking Explanations
Similarly to adversarial examples (that attack the prediction), adversaries can fool explanation methods. This includes showing useless/wrong explanations or showing a specific targeted explanation.
- Zhang et al., "Interpretable Deep Learning under Fire.", USENIX 2020
- Dombrowski et al., "Explanations Can Be Manipulated and Geometry Is to Blame.", NeurIPS 2019