Protein Sequence Annotation using Language Models

Kumaresh Krishnan, Postdoctoral Fellow, The Eddy Laboratory, Harvard University, USA

05 August 2024

Talk summary: Protein function inference relies on annotating protein domains via sequence similarity, often modelled through profile hidden Markov models (profile HMMs), which capture evolutionary diversity within related domains. However, profile HMMs make strong simplifying independence assumptions when modelling residues in a sequence. In this talk, the speaker introduced PSALM (protein sequence annotation with language models), a hierarchical approach that relaxes these assumptions and uses representations of protein sequences learned by protein language models to enable high-sensitivity, high-specificity residue-level protein sequence annotation. He validated PSALM’s performance on a curated set of ‘ground truth’ annotations determined by a profile HMM-based method and highlighted PSALM as a promising alternative for protein sequence annotation.

Speaker bioKumaresh Krishnan is a Postdoctoral Fellow in The Eddy Laboratory at Harvard University. He works on machine learning models for annotating, understanding, and analysing protein sequences. He has a PhD from Harvard University where he worked in systems neuroscience, building models of decision making and attentional switching using zebrafish as a model organism. Krishnan’s undergraduate and Master’s training is in Computer Science and Electrical Engineering from the International Institute of Information Technology (IIIT) Bangalore and he brings this strong computational background to tackle complex real world biological problems.

[Talk organised in collaboration with the Department of Computational and Data Sciences]