Seminar - Identification of Post-translational Modifications of Proteins from Tandem Mass Spectra Data using Genetic Programming

ECS PhD Proposal

Speaker: Samaneh Azari
Time: Wednesday 28th June 2017 at 03:00 PM - 04:00 PM
Location: Cotton Club, Cotton 350

Add to Calendar Add to your calendar

Abstract

In proteomics, peptide identification refers to finding the most likely amino acid sequences, which correspond to tandem mass spectra (MS/MS). Despite improvements in mass spectrometry (MS) instrumentation and peptide identification methods, a significant number of MS/MS spectra still remain unassigned. This is mainly because the presence of post-translational modifications (PTMs) is not considered in the identification process. A PTM is the chemical modification of amino acids in the polypeptide chain of a protein, and multiple PTMs could be associated with a disease or a drug-treatment state, therefore it is important to identify all PTMs in the target protein. Identification of peptides and PTMs is a challenging task. One of the major challenges is the existence of noise in MS/MS spectra. The number of signal peaks is generally small compared to the noise peaks in proteomics data, which makes the MS/MS data highly unbalanced. Also missing peaks, caused by incomplete MS fragmentation and low sensitivity of the mass spectrometers, make it more difficult to infer a full-length peptide sequence. Moreover, the identification algorithm needs to explore a large search space of all possible amino acid sequences for each spectrum, so it leads to high false discovery rate. In addition, PTMs also significantly increase the complexity of the peptide identification problem. Genetic Programming (GP), an effective global search algorithm, is derived from biological principles. This work will apply GP and its problem-solving abilities to automatic peptide and PTM identification. The goal of this thesis is to develop a GP-based multi-stage peptide and PTM identification system, which aims at increasing the number of identified peptides where contain PTMs. GP will be used to perform different tasks such as classification, symbolic regression and optimization to improve the overall performance of the identification system.

Go backGo back to the seminar list