Semester | Summer 2023 |
Course type | Block Seminar |
Lecturer | Jun.-Prof. Dr. Wressnegger |
Audience | Informatik Master & Bachelor |
Credits | 4 ECTS |
Room | 148, Building 50.34 |
Language | English |
Link | TBA |
Registration | https://ilias.studium.kit.edu/goto.php?target=crs%5F2081074&client_id=produktiv |
This seminar is concerned with explainable machine learning in computer security. Learning-based systems often are difficult to interpret, and their decisions are opaque to practitioners. This lack of transparency is a considerable problem in computer security, as black-box learning systems are hard to audit and protect from attacks.
The module introduces students to the emerging field of explainable machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.
Topics cover different aspects of the explainability of machine learning methods for the application in computer security in particular.
Date | Step |
Tue, 18. April, 9:45–11:15 | Primer on academic writing, assignment of topics |
Thu, 27. April | Arrange appointments with assistant |
Tue, 02. May - Fri, 05. May | 1st individual meeting (First overview, ToC) |
Mon, 05. June - Fri, 09. June | 2nd individual meeting (Feedback on first draft of the report) |
Wed, 28. June | Submit final paper |
Mon, 10. July | Submit review for fellow students |
Fri, 14. July | End of discussion phase |
Fri, 21. July | Submit camera-ready version of your paper |
Fri, 28. July | Presentation at final colloquium |
News about the seminar, potential updates to the schedule, and additional material are distributed using the course's matrix room. Moreover, matrix enables students to discuss topics and solution approaches.
You find the link to the matrix room on ILIAS.
Every student may choose one of the following topics. For each of these, we additionally provide two recent top-tier publications that you should use as a starting point for your own research. For the seminar and your final report, you should not merely summarize that paper, but try to go beyond and arrive at your own conclusions.
Moreover, all of these papers come with open-source implementations. Play around with these and include the lessons learned in your report.
Generative Adversarial Networks (GAN) and transformers have led to great advances in many tasks. However, the generative model interpretability is less explored compared to discriminative models. This topic would investigate, and taxonomize existing explanation methods for generative models, point out the difference between distinctive model and generative model explanations, and further discuss limitations.
Concept-based explanations characterize the global behaviour of a DNN with high-level human-understandable concepts. A few recent studies have proposed methods to discover the post-hoc concept-based explanations of trained models based on different assumptions. This work would summarize existing post-hoc methods, discuss their limitations and point out the relations of concept-based explanations to other human interpretable representations.
Knowledge-graph (KGs) have been widely applied in various domains for different purposes. It can help machine learning systems to be more explainable and interpretable. The topic would systematize the current knowledge-graph-based explanations and describe the application domains and further discuss the challenges.
XAI helps to understand which input features contribute to a neural networks output. A variety of explanation methods have been proposed in recent years and more often multiple explanation methods point to different input features. How to choose the best method for a given task? How can we measure the quality of explanations?
Adversarial learning is the method to make machine learning models robust against attacks. Still, the decisions of the model need to be explainable to make the model trustworthy. How does adversarial learning affect explainability? Are robust models easy to explain and understand? Can explainability be used for making the models robust?
Explainable systems are deployed to support transparency, fairness, and trust in AI. However, recent works propose to fairwash the system. Here, the adversarial aim is to obtain a seemingly fair system to pass auditing or validation processes, even if the system shows unfair behaviour and thus its deployment is highly questionable.
It turns that similar to adversarial examples, which attack the classification result, we can fool explanation. Discuss the mean-spirited goals of attacking explanations by outlining general approaches. How can these attacks be defended and how to measure the robustness of XAI methods?
Black box explanations of a machine learning model are generated without using its internal parameters and model-specific characteristics. Are model-agnostic black box explanation methods trustworthy? How do they compare against white box explanation methods in terms of robustness?