top of page
Bigdatagenlab_logo_4.png
Our Research 
Research Interests
Computational Genomics
Statistical Genetics
Machine Learning
Algorithms

ML in medicine

The Big Data and Genomics Lab at UCLA is led by Eran Halperin. Our lab is affiliated with the Computer Science Department in the School of Engineering and the Departments of Human Genetics, Computational Medicine, and the Department of Anesthesiology in the School of Medicine. Students are welcome to join our lab through the following graduate programs: Bioinoformatics, Computer Science, Biomath, and MD-PhD.
Research

Our lab aims to improve our understanding and treatment of human disease by the analysis of big data collected in relation to the diseases. In recent years our main focus was on the analysis of different types of genomic data, including genetics, methylation, RNA expression, single-cell and single-nucleus RNA, and microbiome data. Another major effort in the lab is the development of predictive methods for clinical outcomes using a combination of genomics data and clinical data available by the UCLA hospital and by other collaborators.

 

The methodology we apply and develop involves a combination of machine learning, optimization algorithms, combinatorial optimization, and Bayesian statistics. In particular, our lab is currently focusing on the development of methods in the following domains:

 

  • Methods for high-throughput genomic data in heterogeneous tissues. We develop deconvolution and decomposition methods for bulk methylation and RNA data using information from sorted cells and from single-cell and single nucleus RNA expression data. This provides opportunities to study cell-type-specific biology using bulk measurements. 

  • Methods for microbial source tracking. We are developing methods for the inference of the sources contributing to a microbiome sample across a large number of contexts.

  • Methods for the analysis of time-series genomic data. We are developing statistical models that capture time-series data, particularly in microbiome. Such models can be useful to understand the dynamics of the genomic data under different scenarios.

  • Data Mining of Medical Records. One of the current efforts of our lab is the prediction of health related outcomes (e.g., adverse surgery outcomes) using the large collection of medical records, genetic data, methylation data, clinical images, and physiological waveforms in the UCLA hospital, as well as other data provided through our collaborators. 

 

In addition to methods development, we work closely with groups around the world in order to study specific diseases. Particularly, we have been working on studies of non-Hodgkin lymphoma, leukemia, rheumatoid arthritis, asthma, coronary artery disease, myocardial infarction (heart attack), metabolic syndrome, and obesity. These studies shed important light on the biological mechanisms of these diseases, and they will hopefully pave the way for improved diagnosis and personalized treatment based on an individual's genomic data.

Selected Publications 

"FEAST: fast expectation-maximization for microbial source tracking"

Nature Methods, 2019

"Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology"

Nature Communications, 2019

"CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets"

Genome Biology, 2019

"Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests"

Nature Communications, 2018

"BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference."

Genome Biology, 2018

"Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation"

Nature Methods, 2017

Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies" 
Nature Methods, 2016

"Fast and accurate construction of confidence intervals for heritability"

The American Journal of Human Genetics, 2016

 

"A genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns." 
PNAS, 2015. Read more

 

"Characterization of whole-genome autosomal differences of DNA methylation between men and women."
Epigenetics & Chromatin, 2015 Read more


"Identifying personal genomes by surname inference" 
Science, 2013. Read more

 

"A model-based approach for analysis of spatial structure in genetic data" 
Nature Genetics, 2012. Read more

 

"Joint analysis of multiple metagenomic samples", 
PLoS Computational Biology, 2012.

 

"Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32" 
Nature Genetics, 2010. Read more

 

"Genomic privacy and limits of individual detection in a pool" 
Nature Genetics, 2009. Read more

 

"Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma" 
Nature Genetics, 2009. Read more

 

"SNP imputation in association studies"
Nature Biotechnology, 27(4), 349-51, 2009. Read more

 

"Maximizing power in association studies"
Nature Biotechnology, 27(3), 255-6, 2009. 
Read more

"Estimating local ancestry in admixed populations"

The American Journal of Human Genetics, 2008.
 

Our lab is currenly looking for data scientists. Please click here for more details.
HalperinLAB_edited.jpg
Members
GROUP
biglab_2.png
Funding
We thank the National Science Foundation and the National Institute of Health for their current support. We also thank the Israeli Science Foundation, the German-Israeli Science Foundation, IBM, the Blavatnik Research Foundation, the Juludan Research Foundation, the National Institute of Health, and The Edmond J. Safra Center for Bioinformatics for their support in the past, and hopefully in the future. 
Software
Software developed by
biglab_2.png
Software developed and maintained by our group.
  • FEAST: Microbial source tracking using Expectation Maximization

  • ReFACTor: Correcting cell-type heterogeneity in whole-genome methylation data.

  • TCA: A deconvolution method for cell-type specific analysis of methylation data.

  • Bisque: A method for the inference of cell-type composition in heterogeneous tissues using RNA data.  

  • MTV-LMM: A method for the analysis and prediction of temporal microbiome data using linear mixed models.

  • CONFINED: A method for the detection of biological confounders (vs. technical) in methylation data.

  • GLINT: A user-friendly command-line tool for fast analysis of genome-wide DNA methylation data (EWAS). The package includes implementation of ReFACTor, EPIStructure, Linear Mixed Models association testing, reference-based cell type estimation.

  • LAMP: Estimating Locus Specific Ancestry

  • LAMP-LD: Leveraging LD in the estimation of Locus Specific Ancestry

Software developed by our group, maintained elsewhere.
  • GEVALT: Selecting the most predicting tag SNPs (maintained in Ron Shamir's group).

  • RECYCLER: Detecting plasmids from de-novo assembly (maintained in Ron Shamir's group)
     

Retired Software

  The following are software packages that we developed in the past but no longer maintain:​​​

  • SecureGenome: Maintaining your genomic privacy

  • SEQEM: Assigning short reads to homologous genes in RNA-SEQ experiments. 

  • CAMP: Coalescent based Association Mapping

  • WHAP: Weighted Haplotype Association Server. 

  • LOCO-LD: LD corrected spatial analysis 

  • SPA: Spatial Ancestry Analysis in genetic data

  • BARCODE: Compression of sequence data using Bloom filters 

 

News
biglab_2.png
NEWS

May, 2016 - Yedael Waldman, a joint post-doc between our lab and in Alon Keinan's Cornell lab, publishes a paper about the genetic history of Bene-Israel Indian Jews. Science DailyThe Times of India,Haaretz.

November, 2015 - Our paper on epigenetic differences between men and women was featured in Nature Reviews Genetics.

October 2015 - Our PNAS paper finds that people choose their mates based on their genomes. NDTVDaily Mail
Business Standard,
Science Daily.

 

January 2013 - Our Science paper came out on the first page of the printed version of Israeli newspaper Ha'haretz, two days before the national parliament elections... 
Ha'aretz paper version Ha'aretz first page

 

July 2012 - Israeli business newspaper (The Marker):
40 under 40 
The Marker paper version The Marker online version

 

June 2012 - Genetic GPS 
Wired MagazineHaaretz Israeli NewsHaaretz (Hebrew version)Times of IsraelLondon Jewish ChronicleInternational Business Times (Italian)

 

May 2011 - Profiles in Computer Science 
Biomedical Computation Review

 

September 2009 - From DNA Data to Disease Diagnosis
Frontier EconomySpanish version

 

August 2009 - Researchers Claim New Software Can Skirt Privacy Challenges of GWAS Data-Sharing 
GenomeWebInnovation reportPhysorgThe Medical News (Sydney, Australia)Jerusalem PostYnet (Israel)Biopharma.

 

July, 2009 - Gene linked to increasingly common type of blood cancer EurekAlert

296A Engineering VI

University of California, Los Angeles 90095-1600

 

ehalperin@cs.ucla.edu

Contact
bottom of page