Knowledge and bias extraction from Large Language Models (Project #8)

University of Oslo, Department of Informatics 

Three year PhD position
(An extension of the appointment by up to twelve additional months may be considered, which will be devoted to career-enhancing duties, e.g. teaching or supervision. This will be dependent on the qualifications of the applicant and the specific teaching needs of the department.)

Description

A major concern when dealing with complex machine learning models is how to determine what influences the outcome of such models, the well-known explainability challenge. There have been several approaches to extract interpretable abstractions from machine learning models, in particular Large Language Models (LLMs). These approaches greatly contribute to effectively detecting useful information from language models, such as harmful biases. However, they often tackle this issue from a purely practical point of view and therefore lack theoretical guarantees. This project aims to employ techniques based on computational learning theory to extract rules potentially expressing harmful biases from language models, while providing theoretical guarantees for the rules extracted. 

Specific project requirements

  • Master degree in computer science, machine learning, or other relevant quantitative fields. 

  • Experience with Natural Language Processing, in particular large language models (LLMs). 

  • Experience with working on propositional logic and logical reasoning is an advantage. 

Supervisors

Published Jan. 29, 2024 9:35 PM - Last modified Jan. 29, 2024 9:35 PM