John Vivian

Principal Bioinformatics Scientist

Bioinformatics PhD from UC Santa Cruz under David Haussler and Benedict Paten at the Genomics Institute. During my thesis work I helped develop Toil, an open-source distributed workflow system designed for massive scalability and a Bayesian method for identifying gene expression outliers in individual patient samples. I then worked on neoplastic antibody therapeutics at Atreca, specializing in RNA-seq, scRNA-seq, software architecture, data visualization, and statistical analysis. I have also contracted with several startups for bioinformatics and statistical expertise, analysis, development, and AWS infrastructure.

I am currently serving as a Principal Research Scientist, specializing in bioinformatics, at Ideaya Biosciences. My work focuses on leveraging synthetic lethality to develop innovative treatments aimed at specific cancer vulnerabilities.

I am also presently collaborating on a research paper with a colleague at the US Treasury Department. Our focus is on modeling the economic impacts of Covid-19 by examining variations in state pandemic responses. Leveraging advanced statistical methodologies such as Bayesian Dynamic Factor and Synthetic Control Models, we aim to provide a comprehensive analysis of the intricate interplay between health and economic dynamics during this challenging period


Experience

Principal Research Scientist / Bioinformatician

Ideaya

I currently hold the position of Principal Research Scientist at Ideaya Biosciences, focusing on bioinformatics in the cancer therapeutic space. My primary responsibility revolves around MTAP-deleted synthetic lethality, internal bioinformatics infrastructure, and analysis. I am hoping to advance our team's statistical modeling, drawing on my background in Bayesian inference and interest in neural networks,

Apr. 2024 - Present

Senior Bioinformatics Scientist

TeikoBio

Building bioinformatics pipelines to visualize flow cytometry data for customers on a personalized dashboards. Frontend/backend, database, software development, algorithmic refinement, data visualization, research, and statistical analysis.

Jan. 2024 - March. 2024

Senior Bioinformatics Scientist

Atreca

Applying computational expertise to generate insights into the immune responses of oncology patients from one of the world’s largest database of precise, comprehensive antibody and T cell receptor repertoires. Performing analyses on hundreds of millions of antibody sequences, writing scalable bioinformatics pipelines in Python/Nextflow, and developing methods for assays.

Feb. 2023 - Sep. 2023

Bioinformatics Scientist II

Atreca
Jan. 2021 - Feb. 2023

Bioinformatics Scientist I

Atreca
Jun. 2019 - Jan. 2021

Graduate Student Researcher

UC Santa Cruz Genomics Institute

Worked for the Computational Genomics Lab with a focus on distributed computing for large-scale genomics projects and applications. Developed a series of analysis pipelines with emphases on portability, reproducibility, efficiency, and scalability.

Jan. 2015 - Jun. 2019

Summer Workshop Teacher

UC Santa Cruz

Created and taught a week-long bioinformatics bootcamp to undergraduates from around the country. Taught the basics of Git, cloud computing, shell, Python, notebooks, visualization, and gave lectures on evolution, mutation, and next generation sequencing. Materials.

Summer 2016

Teacher's Assistant

UC Santa Cruz Nanopore Lab

Python Programming for Biologists (BME-160). I taught six hours of section a week, held office hours, and graded homework.

Spring 2014

Graduate Student Researcher

UC Santa Cruz Nanopore Lab

Ran and analyzed single-molecule biological nanopore experiments for determining the effectiveness of improving accuracy by “rereading.” Created a hidden markov model to detect differences in methylation state and quantified improved accuracy by rereading.

Sep. 2013 - Dec. 2014

Junior Specialist

UC Santa Cruz Nanopore Lab

Ran and analyzed single-molecule biological nanopore experiments for distinguishing epigenetic modifications on cytosine. Created scientific figures and videos for publications. Designed and modeled prototype nanopore device in AutoCAD.

Jun. 2019 - Jan. 2021

Education

University of California Santa Cruz

Doctor of Philosophy

Bioinformatics

Jan. 2015 - Jun. 2019

Univeristy of California Santa Cruz

Masters of Science

Biomolecular Engineering & Bioinformatics

Sep. 2013 - Dec. 2014

Univeristy of California Santa Cruz

Bachelors of Science

Biomolecular Engineering

Jan. 2010 - Jun. 2012

Skills

Programming Languages & Tools
  • Python
  • Anaconda
  • Jupyter
  • Rust
  • Git
  • Docker
  • R
  • Nextflow
  • Toil
Data Science & Visualization
  • Pandas
  • Polars
  • Numpy
  • Matplotlib
  • Seaborn
  • Plotly
  • Altair
  • Streamlit
Statistics & Machine Learning
  • Bayesian Inference
  • Kalman Filters
  • PyMC3
  • Scipy
  • SKLearn
  • Keras
  • Statsmodels
Cloud
  • Amazon Web Services (AWS)
  • Microsoft Azure
  • Google Cloud Platform (GCP)
Markup & Web
  • Markdown
  • LaTeX
  • HTML
  • CSS

Software

A collection of open-source software I wrote, maintain, or contributed significantly to

Scalable & Efficient Workflow Engine

A scalable, efficient, cross-platform pipeline management system written entirely in Python and designed around the principles of functional programming.

RNA-Seq Quantification

A scalable and modular RNA-seq quantification workflow. Used to process over 20,000 samples on 32,000 cores.

Gene Expression Outlier Detection

A Bayesian model for identifying gene expression outliers for individual single samples (N-of-1) when compared to a cohort of background datasets.

FASTQ Pairing

Very fast FASTQ pairing algorithms written in Rust. Used to correct improperly paired FASTQ files.

Interests

My academic interests include Bayesian inference, machine learning / NNs / LLMs, idiomatic programming & languages, data science, and visualization.

My other interests include drumming, piano, Stable Diffusion, game [ theory | s ], kitty cats, Brazilian Jiu Jitsu, and technical progressive death metal.