Bioinformatics, MS

Online

Bioinformatics

Rapidly increasing knowledge of the molecular biology of the cell has led to bioinformatics, a new discipline that transforms traditional biology into an information science. The School of Engineering's Master of Science in Bioinformatics program prepares students to manage and study data from the explosion of information from the life sciences.

The program meets industry's demand for professionals with solid foundations in genomics, proteomics, programming (Python, Perl and R), sequence and pathway analysis, as well as a host of genome informatics tools and algorithms such as BLAST, BioPython, BioPerl, Bioconductor, and UCSC genome browser.

Students who earn a Bioinformatics Advanced Certificate may apply those credits towards the Bioinformatics Master's Degree. Note that only 9 credits from the Advanced Certificate can be used towards the Bioinformatics Master Degree program.

    About Bioinformatics

    With the medical and life sciences industries growing at a rapid pace, there is an increasing need for advanced computational techniques to handle this expansion of complex data and information.

    As the global bioinformatics market is projected to reach USD 16.18 billion by 2021 from USD 6.21 billion in 2016, growing at a CAGR (Compound Annual Growth Rate) of 21.1% during the forecast period, the demand for Bioinformatics professionals has risen tremendously.

    Bioinformatics is an academic field that seeks to create and advance algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.


    Our 30 credit program offers you a refined skill set including but not limited to functional annotation, statistical analysis, algorithmic development and genomics and proteomics.


    • Agilent Technologies
    • Genentech
    • Google
    • Memorial Sloan Kettering Cancer Center
    • Microsoft
    • New York Genome Center
    • NYU Langone Health
    • Phillips
    • Regeneron
    • Roche
    • Weill Cornell Medicine
    • Yale New Haven Medical Cente

    Admission Requirements

    In order to be eligible to apply for any of our Master’s programs, you must meet the following criteria:

    You must hold a Bachelor's Degree from an accredited institution, which includes a minimum of four years of full-time study. Bachelor of Engineering degrees (based on 180+ ECTS credits) may also be considered. Attention will be given to the programs accredited by ABET and programs accredited/approved by other various regional accrediting associations.


    This program requires a graduate status and certain prerequisite courses depending on your background. If you have a background in computer science or a similar program, you are required to take a chemical and biological foundation in Bioinformatics course. If you are from a chemical or biological science background, you are required to take Introduction to Programming and Problem Solving and Data Structures and Algorithms.


    The following is a list of all action items required to apply.

    • Application
    • Application Fee
    • Personal Statement
    • Resume
    • Official Transcripts
    • Letters of Recommendation
    • GRE or GMAT Score

      The GRE is required for full-time applicants to this program and is not required for part-time applicants. It cannot be substituted with the GMAT.

    • English Language Proficiency Testing

    For more details on the above list, please review the Master’s and Advanced Certificate Application Checklist section.


    Curriculum

    Degree Requirements: 30 Credits

    Algorithms and Data Structures for Bioinformatics BI-GY 7453

    3 Credits Proteomics for Bioinformatics BI-GY7543
    The online proteomics course contributes an application focused specialty class to the bioinformatics curriculum. It will be a tour-de-force of modern proteomics methods and analysis in the context of practical research and clinical applications. The course will teach fundamentals, applications, experiments and predictions in parallel. Thus, each week will include a mix of interactive approaches from background learning, to understanding experimental methodology pro and con, to software usage and sophisticated bioinformatics approaches to prediction. Limitations and complementary of prediction methods will be emphasized. It is desirable (but not required) for students to complete a Biochemistry course before taking this course.
    Prerequisites: Bioinformatics I.
    3 Credits Bioinformatics Iii: Functional Prediction BI-GY7553
    The course covers functional classifications of proteins; prediction of function from sequence and structure; Orthologs and Paralogs; representations of biological pathways; available systems for the analysis of whole genomes and for human-assisted and automatic functional prediction.
    Prerequisites: Bioinformatics II


    3 Credits Biological Foundation for Bioinformatics BI-GY7523
    This course intensively reviews the aspects of biochemistry, molecular biology and cell biology necessary to begin research in bioinformatics and to enter graduate courses in biology. The areas covered include cell structure, intracellular sorting, cellular signaling (i.e., receptors), Cytoskelton, cell cycle, DNA replication, transcription and translation. This course extensively uses computer approaches to convey the essential computational and visual nature of the material to be covered.
    Prerequisites: General Chemistry, General Physics, Organic Chemistry, Calculus or permission of instructor.
    3 Credits Special Topics in “informatics in Chemical and Biological Sciences” BI-GY7573
    This course covers special topics on various advanced or specialized topics in chemo- or bioinformatics that are presented at intervals.
    3 Credits Introduction to Systems Biology BI-GY7613
    This course explains the functioning of basic circuit elements in transcription regulation, signal transduction and developmental networks of living cells, using simplified mathematical models. The course focuses on design principles and information processing in biological circuits. It discusses network motifs, modularity, robustness, evolutional optimization and error minimization by kinetic proofreading in specific applications to bacterial chemotaxis, developmental patterning, neuronal circuits and immune recognition in several well-studied biological systems.
    Prerequisites: Bioinformatics II
    3 Credits Systems Biology: -omes and -omics BI-GY7623
    This course summarizes knowledge in genomics, proteomics, transcriptomics, metabolomics and relative molecular technologies. Topics include an overview of technologies in functional genomics (DNA chip arrays); whole genome expression analysis (EST, MPSS, SAGE, arrays); proteome analysis technology (2D-electrophoresis, protein in situ digestion for mass spectrometric analysis, yeast 2-hybrid analysis. 2-D PAGE, MALDI-TOF spectroscopy); the principles of Nuclear Magnetic Resonance Spectroscopy and Mass Spectrometry technologies for metabolomics, including general principles, the strengths and weaknesses of each technique, the requirements for sample preparation and the options for the management of output data. This course explains how to exploit different -ome database resources for investigations via special practical tasks to lectures. Special attention is focused on nutrigenomics, a multidisciplinary science that uses genomics, transcriptomics and proteomics to study metabolic health. This relatively new area of metabolomics has the potential to contribute significantly to advances in nutrition and health.
    Prerequisites: Bioinformatics II, Bioinformatics III
    3 Credits Transcriptomics BI-GY7633
    Screening of differential expression of genes using microarray technology builds the opportunities for personalized medicine converging soon to medical informatics and to our health care system. The course will start with a discussion of gene expression biology, presenting microarray platforms, design of experiments, and Affymetrix file structures and data storage. R programming is introduced for the preprocessing Affymetrix data for Image analysis, quality control and array normalization, log transformation and putting the data together. Bioconductor software will be dealt with data importing, filtering, annotation and analysis. Machine learning concepts and tools for statistical genomics will be addressed along with distance concept, cluster analysis, heat map and class discovery. Case studies link the methodology to biomolecular pathways, gene ontology, genome browsing and drug signatures.
    3 Credits Next Generation Sequence Analysis for Bioinformatics BI-GY7653
    The online course is aimed at developing practical bioinformatics skills of next generation sequencing analysis. Students will be introduced to current best practices and in high-throughput sequence data analysis and they will have the opportunity to analyze real data in a high-performance Unix-based computing environment. Special attention will be given to understand the advantages, limitations, and assumptions of most widely bioinformatics methods and the challenges involved in the analysis of large scale datasets. Some of the topics that will be covered include, current sequencing platforms, data formats (FASTA, SAM, BAM, VCF), sequence alignment, sequence assembly, variant calling, RNA-seq analysis, and their biological applications. Students enroll into this course should have knowledge of Basic of programming, unix tools, and shell scripting.

    Problem Solving for Bioinformatics: Programming and Prototyping
    BI-GY 7663

    Biology and Biotechnology for Bioinformatics BI-GY 7683

    Statistics and Mathematics for Bioinformatics BI-GY 7723


    To satisfy the Capstone students may choose to take either the Guided Studies or Thesis courses.

    Guided Studies (maximum 6 Credits)

    3 Credits Guided Studies in Bioinformatics I BI-GY7583
    This research/case course can be handled in different ways at the faculty adviser’s discretion. The course may involve a series of cases that are dissected and analyzed, or it may involve teaming students with industry personnel for proprietary or non-proprietary research projects. Generally, the student works under faculty supervision, but the course is intended to be largely self-directed within the guidelines established by the supervising faculty member. Master’s degree candidates must submit an unbound copy of their report to adviser/s one week before the last day of classes.
    Prerequisite: degree status.
    3 Credits Guided Studies in Bioinformatics II BI-GY7593
    This research/case course can be handled in different ways at the faculty adviser’s discretion. The course may involve a series of cases that are dissected and analyzed, or it may involve teaming students with industry personnel for proprietary or non-proprietary research projects. Generally, the student works under faculty supervision, but the course is intended to be largely self-directed within the guidelines established by the supervising faculty member. Master’s degree candidates must submit an unbound copy of their report to adviser/s one week before the last day of classes.
    Prerequisite: degree status.

    (OR)

    Thesis Course (maximum 9 Credits)

    You can register for the Thesis course each semester up to a maximum of three times equivalent to 9 Credits maximum.

    MS Thesis in Bioinformatics BI-GY997X
    Original research, which serves as basis for master’s degree. Minimum research registration requirements for the master’s thesis: 12 units. Registration for research required each semester consecutively until students have completed adequate research projects and acceptable theses and have passed required oral examinations. Research credits registered for each semester realistically reflect time devoted to research.
    Prerequisites for MS candidates: Degree status and consent of graduate adviser and thesis director.


    Suggested Courses

    Either thesis, 9 credits cumulatively over three semesters, or Guided Studies, six credits over two semesters, is a Capstone requirement for completion of the MS in Bioinformatics.

    All courses are subject to change.

    Either thesis, 9 credits cumulatively over three semesters, or Guided Studies, six credits over two semesters, is a Capstone requirement for completion of the MS in Bioinformatics.

    All courses are subject to change.


    Algorithms and Data Structures for Bioinformatics BI-GY 7453 3 Credits

    Problem Solving for Bioinformatics  BI-GY 7663 3 Credits

    Biology and Biotechnology for Bioinformatics BI-GY 7683 3 Credits


    3 Credits Bioinformatics Iii: Functional Prediction BI-GY7553
    The course covers functional classifications of proteins; prediction of function from sequence and structure; Orthologs and Paralogs; representations of biological pathways; available systems for the analysis of whole genomes and for human-assisted and automatic functional prediction.
    Prerequisites: Bioinformatics II
    3 Credits Next Generation Sequence Analysis for Bioinformatics BI-GY7653
    The online course is aimed at developing practical bioinformatics skills of next generation sequencing analysis. Students will be introduced to current best practices and in high-throughput sequence data analysis and they will have the opportunity to analyze real data in a high-performance Unix-based computing environment. Special attention will be given to understand the advantages, limitations, and assumptions of most widely bioinformatics methods and the challenges involved in the analysis of large scale datasets. Some of the topics that will be covered include, current sequencing platforms, data formats (FASTA, SAM, BAM, VCF), sequence alignment, sequence assembly, variant calling, RNA-seq analysis, and their biological applications. Students enroll into this course should have knowledge of Basic of programming, unix tools, and shell scripting.

    Statistics and Mathematics for Bioinformatics BI-GY 7723 3 Credits

    Applied Biostatistics for Bioinformatics  BI-GY 7673 3 Credits


    3 Credits Guided Studies in Bioinformatics I BI-GY7583
    This research/case course can be handled in different ways at the faculty adviser’s discretion. The course may involve a series of cases that are dissected and analyzed, or it may involve teaming students with industry personnel for proprietary or non-proprietary research projects. Generally, the student works under faculty supervision, but the course is intended to be largely self-directed within the guidelines established by the supervising faculty member. Master’s degree candidates must submit an unbound copy of their report to adviser/s one week before the last day of classes.
    Prerequisite: degree status.
    3 Credits Transcriptomics BI-GY7633
    Screening of differential expression of genes using microarray technology builds the opportunities for personalized medicine converging soon to medical informatics and to our health care system. The course will start with a discussion of gene expression biology, presenting microarray platforms, design of experiments, and Affymetrix file structures and data storage. R programming is introduced for the preprocessing Affymetrix data for Image analysis, quality control and array normalization, log transformation and putting the data together. Bioconductor software will be dealt with data importing, filtering, annotation and analysis. Machine learning concepts and tools for statistical genomics will be addressed along with distance concept, cluster analysis, heat map and class discovery. Case studies link the methodology to biomolecular pathways, gene ontology, genome browsing and drug signatures.
    3 Credits Proteomics for Bioinformatics BI-GY7543
    The online proteomics course contributes an application focused specialty class to the bioinformatics curriculum. It will be a tour-de-force of modern proteomics methods and analysis in the context of practical research and clinical applications. The course will teach fundamentals, applications, experiments and predictions in parallel. Thus, each week will include a mix of interactive approaches from background learning, to understanding experimental methodology pro and con, to software usage and sophisticated bioinformatics approaches to prediction. Limitations and complementary of prediction methods will be emphasized. It is desirable (but not required) for students to complete a Biochemistry course before taking this course.
    Prerequisites: Bioinformatics I.


    3 Credits Guided Studies in Bioinformatics II BI-GY7593
    This research/case course can be handled in different ways at the faculty adviser’s discretion. The course may involve a series of cases that are dissected and analyzed, or it may involve teaming students with industry personnel for proprietary or non-proprietary research projects. Generally, the student works under faculty supervision, but the course is intended to be largely self-directed within the guidelines established by the supervising faculty member. Master’s degree candidates must submit an unbound copy of their report to adviser/s one week before the last day of classes.
    Prerequisite: degree status.

    Translational Genomics and Computational Biology BI-GY 7733 3 Credits

    Population Genetics and Evolutionary Biology for Bioinformatics BI-GY 7693 3 Credits

    3 Credits Special Topics in “informatics in Chemical and Biological Sciences” BI-GY7573
    This course covers special topics on various advanced or specialized topics in chemo- or bioinformatics that are presented at intervals.


    More About the Program

    The faculty at NYU Tandon School of Engineering are highly regarded for their extensive knowledge and professional industry experience. Please click on the images or associated links below to learn more about each faculty member.

    [USER:3439,5921,5964,5965,5974,5963|profilegrid?sort=off]


    Hear from our current students and alumni why they chose the Bioinformatics Master's Degree at NYU Tandon School of Engineering.

    Attending the bioinformatics graduate program, at the School of Engineering has greatly benefited my career, and I would advise anyone interested in the field of bioinformatics to consider applying. The intensive curriculum provided me with a very broad skill-set including, but not limited to, functional annotation, statistical analysis, algorithm development / analysis, as well as genomics and proteomics.

    Since graduating in 2011, I have held two positions as a Bioinformatics Software Developer. The skills I acquired while attending NYU allowed me to excel in both positions. I can say with the utmost certainty that my bioinformatics career would not be where it is today, had I not attended this program.

    -Matt Shoa-Azar, Class of 2011

    The extensive and well designed curriculum for bioinformatics at the School of Engineering helped me understand the core concepts in the field of bioinformatics. Practical experience in analysis of next generation sequence data, object oriented approach to data mining and projects in various enterprise data systems were essential to gear up for the challenges in the industry as a person from a purely biological background with no experience in computational tools or methods.

    -Sunil Puranik, Class of 2013

    The bioinformatics program at NYU School of Engineering has been a unique and rewarding experience. From the moment you start classes, the professors waste no time in indulging you in the challenging yet interesting curriculum, which is comprised of a combination of well-developed courses in computer programming, statistics, and biological applications. I found that, the further I progressed in my studies, all the pieces began to come together and I could see how much I had truly learned since my first day.

    The bioinformatics program even helped to provide me with field experience in bioinformatics and data analysis, so I could obtain some exposure to the nature of an industrial bioinformatics environment. I am grateful for this program; for its thorough curriculum, and diligent educators; and feel prepared for whatever challenges I may face.

    -Michael D’Eletto, Class of 2014

    The bioinformatics program at NYU Tandon School of Engineering focuses on current techniques in Next Generation Sequence Analysis and offers tailored courses suited to each student's strengths and interests. Personally, this has helped me transform my skills to industry-readiness and make significant connections to important people in academia.

    -Ramakrishnan Srinivasan, Class of 2014

    I’m working for Leido’s Biomedical Research Inc. as a bioinformatics analyst. I work in a group that provides NGS sequencing service for the National Cancer Institute. My job is to develop NGS data, i.e, Illumina Hiseq/Miseq data, analysis pipeline to perform data management, sequence QC, alignment, coverage analysis, and variant identification. We are using a lot of programming languages and tools that were introduced or taught to me when I was in the MS bioinformatics program at the School of Engineering. I feel lucky that I was given the opportunity to study in this program and l learned so many skills that have benefitted my career path.

    -Wen Luo, Class of 2011


    Below is a showcase of current NYU Tandon School of Engineering student projects from courses related to our Bioinformatics master's degree program. Please check back often to learn more about our new student projects.

    Check out the new BioStar Handbook!

    Danny Simpson, MS 2016

    From Mollusks to Medicine: A Venomics Approach for the Discovery and Characterization of Therapeutics from Terebridae Peptide Toxins

    Rajeeva Lochan Musunuri, MS 2015

    Validating somatic structural variants with local assembly
    Interning at the New York Genome Center

    Detecting structural variants (SVs) from sequencing data is complex and is fraught with high false negative rate. It is therefore necessary to use multiple orthogonal methodologies (such as read depth, read pairs, split reads) to detect structural variants. When searching for somatic SVs in cancer samples (tumor/normal paired analysis), a false negative call in the normal will lead to a false positive somatic call in the tumor. This can be problematic because SVs are known to be highly relevant in cancer development and metastasis.

    Previous studies have shown that assembly based methods have the highest resolution in determining the SV breakpoints with base-pair precision. In this project, I have created a modular framework for validating and also identifying SV calls by performing local assembly of the reads around the breakpoints with different assembly tools such as TIGRA, SGA, SPAdes, CORTEX, FERMI. The framework provides a way to obtain a high quality clinically actionable set of structural variant calls.

    Marina Hoashi MS Class of 2015

    Mammals have evolved to nourish their offspring exclusively with maternal milk for around half of the lactation period, a crucial infant developmental window. In view of the oral-breast contact during lactation and the altered oral microbiota in Caesarean section (C-section) born infants, we expected differences in milk composition by delivery mode. Here we performed a cross-sectional study of microbes and glycosylation patterns in human milk at different times postpartum, and found differences by time after birth only in women who delivered vaginally. These results warrant further research into the role of microbes in milk glycosylation and its developmental functions.

    Rama Srinivasan

    Read the full abstract

    About the Project
    Identification of Novel Peptides from the Venom Duct Transcriptome of Marine Snail Cinguloterebra Anilis

    Abstract
    Molecules produced in nature that are biologically active continue to be the source and inspiration for a vast number of drugs, diagnostics, and pharmacological tools. However, it remains challenging not only to find new organisms that produce natural products, but also to identify all of the bioactive molecules produced by these organisms.

    Marine snails have proven to be good sources of neuroactive peptides in the past. Whereas toxins from species like cone snails have been moderately well categorized, toxins from the vermivorous Terebrid snails remain more poorly characterized.

    Working in collaboration with the Holford Lab at the Hunter College of CUNY, I focus on discovering neuroactive peptides from the venom tissues of the snail Cinguloterebra anilis. We are working on Illumina RNA-Seq data of the anilis venom duct, and aim to assemble, annotate and filter our way to discovering new toxins, later progressing to physiological assays.

    Oscar L Rodriguez

    Read the full abstract

    About the Project
    Joint Automated Genome Annotation of 73 Human Cell Types

    Abstract
    The ENCODE consortium produced functional genomics data in many cell types. Our goal is to annotate the active genomic functional elements in this diverse set of cell types. The challenge is that many of these cell types have little data available. We aim to leverage existing high quality annotations from six well-studied cell types in the production of annotations for the remaining cell types.

    Novel classification and visualization of genome-wide expression patterns in known breast cancer subtypes | Alexander R. Mankovich, Class of 2014

    Introduction to cancer subtyping and signatures for outcome prediction:
    Breast cancer research, while making steady advances in the disease's diagnosis and the discovery of new therapies, is still limited in its capacity to characterize disease subtypes in full. Five molecular subtypes have been described in the past: HER2+/ERBB2+, basal-like, Luminal A, Luminal B, and normal-like. There are several approaches used to classify these subtypes: histopathology, arising from the examination of tissue to assign a grade and particular physiological manifestation of the tumor; molecular pathology, which measures key proteins expressed by the majority of tumor cells; genetic analysis, which identifies genome-wide changes in tumor cells (such as copy number alterations); and gene-expression, the analysis of particular genes driving tumor biology. These four approaches are used together to delineate a patient's tumor into a detailed subclassification driving clinical outlook such as risk of metastasis, likelihood of recurrence, and potential curative therapies using together to delineate a patient's tumor into a detailed subclassification driving clinical outlook such as risk of metastasis, likelihood of recurrence, and potential curative therapies. .

    Utilizing various analytical, statistical, and visual methods, RNA-seq expression signatures can more precisely guide clinical understanding of the driving forces behind tumor biology and further demarcate diverse breast cancer subtypes based on signature motifs and their associated prognostic or predictive factors - such as possible therapies, metastatic potential, recurrence risk, and survival probability. I propose to create a framework which generates long-range expression signatures from tumor samples, selects signatures which are alike, identifies significant correlating prognostic and predictive factors, and visualizes those relationships in a biologically intuitive manner.

    STAT-GPS: a complete functional genome annotation tool focusing on extensive downstream analysis of genes | Michael D’Eletto, Class of 2014

    After generations of sequencing the genomes of various organisms, there exists an abundance of sequencing data that must be analyzed and annotated.  Bioinformaticians are left with the challenge of using open-source programs to align and assemble these millions of reads.  From these genome assemblies, functional properties of individual genes must be annotated before being loaded into databases like Genbank.  Numerous annotation pipelines have been developed; however, emphasis on extensive downstream functional annotation has been lacking.  Software such as the MAKER pipeline provides gene models based on multiple sources of evidence, but stops short of providing any functional information.  Other tools, such as DAVID are accessible only via a web site and hence would require submitting large amounts of data over the web, something many companies are not comfortable with.  Tools such as AutoFACT are not currently maintained and are primarily aimed at RNA transcript annotation.  Corporations also face special needs in that they (1) require high levels of security for their information and (2) are not always able to pay for software that may be free for academics.  In addition, the level of support, documentation, maintenance, and integration for bioinformatics tools varies greatly and is often at too low a level for a small bioinformatics group to deal with.

    This thesis is a continuation of a graduate project revolved around development of an extensive functional annotation pipeline which emphasizes on downstream analysis of genes.  Initial development of the pipeline focused on primary annotations involving ab initio gene prediction and protein/EST alignment to known hits in various databases.  These primary annotations merely touched the surface of the overall function of each annotated gene.  Continual development of the pipeline has delved into the functional and structural analyses of each gene and its proteins, as well as prediction of regulatory, non-coding elements in the DNA.  These analyses include, but are not limited to: (1) automated homology modeling, (2) pathway assignment, (3) ncRNA prediction, and (4) de-novo promoter element discovery.

    This pipeline, known as STAT-GPS (Solazyme Total Annotation Tool for Genomic and Protein Sequences) utilizes a combination of both open-source software and remote servers to attain the most reliable, accurate, and thorough functional annotation possible.  This program, which is developed in the Python language, is intended for both genomic and RNA transcripts, although genomic transcripts are the main goal.  The source code is available for download and redistribution on Github.  A formal paper intended for publication in the Bioinformatics journal is being written concurrently and will include supplementary data about the efficiency of this pipeline.

    Malcolm Houtz, Class of 2015

    In 2011, Gan et al published work indicating that different accessions of Arabidopsis thaliana use alternate gene models to those annotated in the reference genome. An implication of this finding is that a large proportion of genes predicted to be damaged or knocked out (using the reference genome annotation) in non-reference accessions were in fact not influenced by these mutations. The transcriptomes were reassembled for 18 accessions, and new annotation files were created.

    Using RNA-Seq data already sequenced and assembled by Purugganan Laboratory, we propose to study and potentially re-annotate the transcriptomes of 4 rice accessions.

    The first phase of the project involves matching gene-ids with known polymorphisms or indels to a large FPKM matrix. A summarized categorization of expressed and unexpressed genes will be delivered. The summary will give an indication of expression levels for genes predicted to be damaged. Each accession was tested under many different conditions – summary at different levels may make sense. This piece of the project is intended to extend Malcolm’s very basic R skills.

    If a significant number of genes which are predicted to be damaged are in fact expressed, transcriptomes will be reassembled and annotated. Using an existing General Feature Format file, we will find additional, novel transcripts and create new GFF’s for each of the 4 sequenced accessions.

    Although familiar software (cufflinks) does allow the discovery of novel transcripts, the method for updating an existing GFF with additional transcripts is currently unclear.

    Final deliverables will be pipelines for transcript reassembly and updating GFFs with additional annotations.

    Oscar Rodriguez, Undergrad Class of 2014

    Download project poster

    Background: 
    The ENCODE consortium produced functional genomics data in many cell types. Our goal is to annotate the active genomic functional elements in this diverse set of cell types. The challenge is that many of these cell types have little data available. We aim to leverage existing high quality annotations from six well-studied cell types in the production of annotations for the remaining cell types.

    Approach:
    We use the genome annotation software Segway to perform annotations, augmented with entropic graph-based regularization (EGBR) to leverage existing annotations. We chose cell types that had at least two out of four distinct types of assays (DNase-seq, RNAseq, histone modification ChIP-seq and transcription factor ChIP-seq).

    Results:
    We will produce functional annotations of 73 cell types.  These annotations will be made publicly available on the UCSC Genome Browser.  In addition, the project has successfully migrated the Segway+EGBR annotation software to the DNAnexus cloud computing platform.


    The faculty for the online Bioinformatics Master's Degree program is drawn across NYU and the Tandon School of Engineering. The dedicated faculty focus on the careful study and practice of Bioinformatics, engaging students day-to-day while participating in research.   


    The NYU Tandon School of Engineering's Advisory Board is comprised of experienced leaders from several industries and academia who provide valuable insights and recommendations to NYU Tandon Online, The Online Learning Unit. The Board meets twice a year to review the program's curriculum, progress, and consider new ideas needed to meet industries demands. 

    [USER:5921,3439,5939|profilegrid?sort=off&contact_info=off&department=on]

    [USER:5926,5928,5929,5927,5940,5941,5942,6036,6161,6216,336,6453|profilegrid?contact_info=off&department=on]