Neil Thomas

I work at the intersection of AI and biology. I am currently at a startup building AI tools for protein design. Previously, I was a Research Scientist at Google X.

I completed my PhD at UC Berkeley in 2022, advised by Yun S. Song. Prior to that, I was an AI Resident at Google X and a Software Engineer at 23andMe. I received my BS in Engineering Mathematics and Statistics from UC Berkeley.

When I'm not being humbled by biology, I like to be humbled by a variety of hobbies. I like cooking recipes from Alison Roman, climbing rocks, skiing, playing ultimate frisbee, cycling, playing piano, watching comedy, and watering my plants.

email / twitter / github / scholar / linkedin

Research Highlights

My research focuses on learning meaningful representations of proteins, with the aim of enabling applications in protein design, functional annotation, and structure prediction. My thesis talk "Browsing in the Library of Babel" serves as an accessible introduction.

	Engineering highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening Neil Thomas, David Belanger, Chenling Xu, Hanson Lee, Kathleen Hirano, Kosuke Iwai, Vanja Polic, Kendra Nyberg, Kevin Hoff, Lucas Frenz, Charlie Emrich, Jun W Kim, Mariya Chavarha, Abi Ramanan, Jeremy Agresti, Lucy J Colwell bioRxiv, 2024 paper / code Designed thousands of highly active, diverse nuclease enzymes using neural network models trained on experimental screening data, outperforming a traditional in vitro directed evolution campaign.
	Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design Neil Thomas, Atish Agarwala, David Belanger, Yun S. Song, Lucy J. Colwell bioRxiv, 2022 paper / code / tweetorial Tunable, realistic, synthetic fitness landscapes for benchmarking protein design.
	Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention Nicholas Bhattacharya, Neil Thomas, Roshan Rao, Justas Dauparas, Peter K. Koo, David Baker, Yun S. Song, Sergey Ovchinnikov Pacific Symposium on Biocomputing, 2022 paper / code / tweetorial / talk Introduces “factored attention,” a simplified attention layer that we use to compare and contrast Potts models and Transformers.
	Evaluating Protein Transfer Learning with TAPE Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song Advances in Neural Information Processing Systems (Spotlight)*, 2019 paper / code / tweetorial / talk / podcast / blog A suite of benchmarking tasks for protein language models.

Research

For an up-to-date list, see Google Scholar

	Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation Shaohua Fan, Jeffrey P. Spence, ..., Neil Thomas, ... Yun S. Song, Sarah A. Tishkoff, et al. Cell, 2023 paper
	End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman Samantha Petti, Nicholas Bhattacharya, Roshan Rao, Justas Dauparas, Neil Thomas, Juannan Zhou, Alexander M. Rush, Peter K. Koo, Sergey Ovchinnikov Bioinformatics, 2022 paper
	Functional genomics of OCTN2 variants informs protein-specific variant effect predictor for Carnitine Transporter Deficiency Megan L. Koleske, Gregory McInnes, Julia E. H. Brown, Neil Thomas, ... Yun S. Song, Russ B. Altman, Kathleen M. Giacomini, et al. PNAS, 2022 paper
	Minding the gaps: The importance of navigating holes in protein fitness landscapes Neil Thomas, Lucy Colwell Cell Systems (Preview), 2021 paper

Teaching

During my graduate studies at Berkeley I had the privilege of teaching:

Summer 2022 CS 188: Introduction to Artificial Intelligence
Fall 2020 Stat 135: Concepts of Statistics

My teaching statement.

Built on Leonid Keselman's Jekyll fork of Jon Barron's website

Neil Thomas

Research Highlights

Engineering highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening

Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

Evaluating Protein Transfer Learning with TAPE

Research

Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman

Functional genomics of OCTN2 variants informs protein-specific variant effect predictor for Carnitine Transporter Deficiency

Minding the gaps: The importance of navigating holes in protein fitness landscapes

Teaching