Applications of machine learning and algorithms to problems in biology
Professor Eskin has helped develop multiple software and Web-based programs to improve the understanding of the genetic basis for complex diseases via modeling of human variation. He co-developed HAP, a highly accurate method for haplotype resolution from genotype data. Eskin has written extensively on haplotypes, including a paper on "Large Scale Reconstruction of Haplotypes from Genotype Data,' [Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology 2003]. Another paper, with CSE professor Pavel Pevzner, focused on finding composite regulatory patterns in DNA sequences and led to creation of the MITRA algorithm, for finding regulator motifs in DNA sequences. HAP and MITRA are available to other scientists over the Web. Eskin also helped develop a method for classifying proteins into families using sparse Markov transducers (SMTs). Eskin's research in computational biology derived from earlier work on sequence models in data mining and machine learning. Previously used in language parsing and analysis, and more recently in computer security to detect intruders, many sequence models can be characterized as 'sparse,' i.e., only a fraction of the elements of the sequence have meaningful value. This is the case in the analysis of DNA sequences, where only about 1%-3% of the sequence has any biological significance. Eskin helped develop a new efficient framework for approaching sparse DNA sequence modeling problems.
Prior to joining the CSE department in 2003, Eleazar Eskin was a post-doctoral researcher at the Hebrew University of Jerusalem in computer science with professors Yoram Singer and Nir Friedman. He completed his Ph. D. in Computer Science at Columbia University in 2001 (ck), where he did his thesis on "Sparse Sequence Modeling with Applications to Computational Biology and Intrusion Detection." At Columbia, Eskin was a member of the Natural Language Processing group prior to joining the Data Mining Lab, and later was in the Computational Biology group.