An increasing body of work has demonstrated that behavior in neural networks can often be explained and controlled through linear operations on their hidden activations. Model steering, for instance, is a technique where a specific target concept is elicited in the predictions of a generative model by adding a concept vector to its activations
Recent work suggests that these concept vectors can be transferred between models. In this PhD internship, you will investigate the feasibility of cross-model feature transfer for improving the performance and reliability of machine learning models.
Location: Antwerp (Belgium)