An increasing body of work has demonstrated that behavior in neural networks can often be explained and controlled through linear operations on their hidden activations. Model steering, for instance, is a technique where a specific target concept is elicited in the predictions of a generative model by adding a concept vector to its activations
Recent work suggests that these concept vectors can be transferred between models. In this PhD internship, you will investigate the feasibility of cross-model feature transfer for improving the performance and reliability of machine learning models.
You will review scientific literature
You will design and implement a cross-model feature transfer method(s)
Evaluate the effectiveness of the proposed technique(s) on representative problems
Consolidate your research in a scientific publication and/or a patent application
Location: Antwerp (Belgium)
QualificationsStudent enrolled in Ph.D. Computer Science/Engineering in Machine Learning
Strong programming skills in Python
Language skills: English
Experience with representation engineering, mechanistic interpretability, or explainable AI is a big plus.
A strong publication record is a big plus.