Explainable Machine Learning

Overview

Semester	Winter 2021
Course type	Block Seminar
Lecturer	TT.-Prof. Dr. Wressnegger
Audience	Informatik Master & Bachelor
Credits	4 ECTS
Room	148, Building 50.34 and online
Language	English or German
Link	https://ilias.studium.kit.edu/goto_produktiv_crs_1600111.html
Registration	Please register for the course in ILIAS

Remote Course

Due to the ongoing COVID-19 pandemic, this course is going to start off remotely, meaning, the kick-off meeting will happen online. The final colloquium, however, will hopefully be an in-person meeting again (<- This time we might indeed have a chance).

To receive all the necessary information, please subscribe to the mailing list here.

Description

This seminar is concerned with explainable machine learning in computer security. Learning-based systems often are difficult to interpret, and their decisions are opaque to practitioners. This lack of transparency is a considerable problem in computer security, as black-box learning systems are hard to audit and protect from attacks.

The module introduces students to the emerging field of explainable machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics cover different aspects of the explainability of machine learning methods for the application in computer security in particular.

Schedule

Date	Step
Tue, 19. Oct, 14:00–15:30	Primer on academic writing, assignment of topics
Thu, 28. Oct	Arrange appointment with assistant
Mo, 1. Nov - Fr, 5. Nov	Individual meetings with assistant
Wed, 1. Dec	Submit final paper
Wed, 22. Dec	Submit review for fellow students
Fri, 7. Jan	End of discussion phase
Fri, 21. Jan	Submit camera-ready version of your paper
Thu, 10. Feb	Presentation at final colloquium

Mailing List

News about the seminar, potential updates to the schedule, and additional material are distributed using a separate mailing list. Moreover, the list enables students to discuss topics of the seminar.

You can subscribe here.

Topics

Every student may choose one of the following topics. For each of these, we additionally provide a recent top-tier publication that you should use as a starting point for your own research. For the seminar and your final report, you should not merely summarize that paper, but try to go beyond and arrive at your own conclusions.

Moreover, all of these papers come with open-source implementations. Play around with these and include the lessons learned in your report.

Counterfactual/Contrastive Explanations
Every "why" question implicitly contains a contrast. Humans are not asking just "why?", we ask "why A rather than B?". The research field of counterfactual and contrastive explanation focuses on this alternative scenario B and especially how it can be generated.
- Stepin et al., "A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence", IEEE Access 2021
Interactive Explanations
Some authors claim that explanations for humans should be interactive. In an interactive dialog a human points to the part s/he wants to understand.
- Wexler et al., "The What-If Tool: Interactive Probing of Machine Learning Models", IEEE TVCG 2020
Using User Studies to evaluate Explanations
The seminar report should present approaches that use user studies to evaluate the quality of explanations. The limitations, problems, and results of such an evaluation is important.
- Hendricks et al., "Grounding Visual Explanations”, ECCV 2018
Insights into Explainable Machine Learning from Philosophy and the Social Sciences
It turns out there is a vast amount of philosophical work and papers produced by the social sciences on explanation and on how human provide and understand explanations. These lines of research already use a detailed taxonomy of explanations, causes of effects, and (human) behavior.
- Van Bouwel and Weber, "Remote Causes, Bad Explanations?", Journal for the Theory of Social Behaviour 2002
Explainable Active Learning
Explainable active learning is a novel paradigm that introduces XAI into an AL setting. The benefits are supporting trust calibration, enabling rich forms of teaching feedback, and potential drawbacks–anchoring effect with the model judgment and additional cognitive workload.
- Teso, "Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets", WIAL 2019
Concept Based Explanations
Most of ML explanation methods revolved around importance to individual features or pixels, which suffers from several drawbacks. Therefore, multiple works focus on high level human concept-based explanations, which should be studies as part of this seminar topic.
- Amirata et al., "Towards automatic concept-based explanations", NeurIPS 2019
Measuring the Quality of Explanations
This seminar report should discuss and relate the possibilities for comparing explanations methods with each other.
- Adebayo et al., "Sanity Checks for Saliency Maps", NeurIPS 2018
Attacking Explanations
Similarly to adversarial examples (that attack the classifier), we can fool explanation methods. This includes showing useless/wrong explanations or showing a specific targeted explanation.
- Zhang et al., "Interpretable Deep Learning under Fire", USENIX Security 2020