Knowledge and bias extraction from Large Language Models (Project #8)

University of Oslo, Department of Informatics

Three year PhD position
(An extension of the appointment by up to twelve additional months may be considered, which will be devoted to career-enhancing duties, e.g. teaching or supervision. This will be dependent on the qualifications of the applicant and the specific teaching needs of the department.)

Description

A major concern when dealing with complex machine learning models is how to determine what influences the outcome of such models, the well-known explainability challenge. There have been several approaches to extract interpretable abstractions from machine learning models, in particular Large Language Models (LLMs). These approaches greatly contribute to effectively detecting useful information from language models, such as harmful biases. However, they often tackle this issue from a purely practical point of view and therefore lack theoretical guarantees. This project aims to employ techniques based on computational learning theory to extract rules potentially expressing harmful biases from language models, while providing theoretical guarantees for the rules extracted.

Specific project requirements

Master degree in computer science, machine learning, or other relevant quantitative fields.
Experience with Natural Language Processing, in particular large language models (LLMs).
Experience with working on propositional logic and logical reasoning is an advantage.

Supervisors

Associate Professor Ana Ozaki, anaoz@ifi.uio.no contact person for inquiries about the position
Professor Lilja Øvrelid
Professor Erik Velldal

Published Jan. 29, 2024 9:35 PM - Last modified Jan. 29, 2024 9:35 PM