|
AbstractsInvited speakersRuedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich and Faculty of Science, University of ZurichSearching and Mining of Proteomic SWATH-MS datasetsRecently we introduced a new data independent (DIA) acquisition method termed SWATH-MS (1). This method, in effect, is a time-and-mass segmented acquisition method where complex, high-specificity fragment ion maps of all precursor ions within a user-defined precursor RT and m/z space are being generated and recorded. This is accomplished by stepping the isolation window of a specifically tuned quadrupole time-of-flight (QqTOF) instrument in discrete increments recursively throughout the duration of the LC separation. The data acquired by SWATH-MS are not searchable by conventional database search engines, because each fragment ion spectrum is a composite of multiple, concurrently fragmented precursor ions. In this presentation we will describe an automatic pipeline for peptide identification and quantification from SWATH-MS datasets. It is conceptually related to the mProphet algorithm developed for the analysis of S/MRM datasets (2). The algorithm applies a targeted search strategy, whereby peak groups uniquely identifying a particular peptide are extracted from the SWATH-MS dataset and assigned a probability of being correctly associated with the target peptide. The algorithm uses a system of individual feature score rankings that are then combined into a composite score. The performance of the method will be illustrated with selected examples that indicate the power of the approach for the reproducible analysis of proteomes, the detection of modified peptides and the estimation of the absolute quantity of proteins and proteomes.
Ingemar André Center for Molecular Protein Science Biochemistry and Structural Biology Lund University ingemar.andre@biochemistry.lu.seDesign and Prediction of Protein Self-assemblyMany of the largest protein complexes in biology are composed of a single type of subunit that is repeated a large number of times to generate a functional assembly. Such homomeric structures are often assembled spontaneously from individual components through the process of self-assembly. Research in our group is focused on the prediction of the three-dimensional structure of homomeric assemblies and the rational design of novel self-assembling proteins and peptides. Over the last several years we have developed computational methods to model the structure of homomeric assemblies using the powerful constraint of molecular symmetry. In this presentation I will illustrate how these prediction methods, in conjunction with limited experimental constraints, can be used to tackle important problems in structural biology. The second part of the talk will deal with the rational design of self-assembling proteins and peptides. We combine the powerful design template of self-assembly with structural modeling and computational protein to design protein assemblies on an atomic level.
Samuel Flores, Uppsala UniversityA structural and dynamical model of human telomeraseMutations in the telomerase complex disrupt either nucleic acid binding or catalysis, and are the cause of numerous human diseases. Despite its importance, the structure of the human telomerase complex has not been observed crystallographically, nor are its dynamics understood in detail. Fragments of this complex from Tetrahymena thermophila and, more controversially,Tribolium castaneum have been crystallized. Biochemical probes provide important insight into dynamics. In this work we use available structural fragments to build a homology model of human TERT, and validate the result with functional assays. We then generate a trajectory of telomere elongation following a “typewriter” mechanism: the RNA template moves to keep the end of the growing telomere in the active site, disengaging after every 6-residue extension to execute a “carriage return” and go back to its starting position. A hairpin can easily form in the telomere, from DNA residues leaving the telomere-template duplex. The trajectory is consistent with available experimental evidence and suggests focused biochemical experiments for further validation.
Jan Gorodkin, Center for non-coding RNA in Technology and Health, DenmarkTowards the search for RNA-RNA interaction based networksWithin recent years the awareness of non-coding RNAs has increased rapidly and experimental as well as in silico results elucidate the large potential. Here, the motivation takes outset in the thousands of in silico generated RNA structure candidates in the genome. A major challenge is to assign function to these. The first step is to search for RNA interactions to other RNAs (DNA or proteins). Searching for RNA-RNA interactions is in general a time consuming task. As a first approach we have developed an approach searching for only near complement interactions (ignoring intra molecular base pairs). We show that this approach is faster than existing methods, while maintaining accuracy and show that the method can be used as filter (on existing methods) for microRNA target search. In a case study on microRNAs, we combined target predictons (conserved in human and mouse) to protein coding genes with literature mining and obtained a combined enrichment to only transcriptor factors (TFs) and subsequently found that TFs are also enriched for targeting microRNAs. Our results suggests a network of mutual activating and suppressive regulation.
Ivo Gut, Centro Nacional de Analysis Genomico, C/Baldiri Reixac 4, 08028 Barcelona, Spain.High-resolution whole-genome analysis and cancerThe International Cancer Genome Consortium (ICGC) aims to fully characterize in the 50 most common forms of cancer 50 tumour/normal sample pairs exhaustively and then to validate observations in further 450 samples. The first three years of this project have seen huge advances in the development, implementation and standardisation of the methods for characterising samples, ethical approval, whole-genome sequencing, exome sequencing, RNA sequencing, epigenetic analysis, methods for validation, informatics analysis and data basing. The Spanish contribution to the ICGC is on Chronic Lymphocytic Leukaemia (CLL). Our main responsibility has been on whole genome sequence analysis, exome analysis, RNA sequence analysis and epigenetic analysis. Complete genome sequencing of many samples requires bringing together many different elements, starting from samples, preparation for sequencing, sequencing itself, data analysis, through to verification of results and translating a result into biological knowledge. Thorough examination of the first 4 tumour/normal pairs and follow up in a large replication set allowed us to identify four recurrent in the NOTCH1, XPO1, MYD88 and the KLHL6 genes. In an extension we analysed 100 tumour/normal pairs by exome sequencing which allowed the identification of further recurrent somatic mutations, the most frequent being in SF3B1 and POT1. Interestingly the two recognised subtypes of CLL, immunoglobulin modulated and not, do not completely reflect themselves in the recurrent mutations. The methods and findings will be discussed. Paul Horton, CBRCExcavating human NUMTsNUMTs (Nuclear mtDNA), are partial copies of the mitochondrial genome found in the nuclear genome. They are sometimes referred to as molecular fossils, and, due to the higher mutation rate of mtDNA, can in some cases be more similar to parts of our ancestral mtDNA than our extent mtDNA genome is. The existence of NUMTs has been known for decades and many informatics studies on NUMTs have attempted to elucidate the characteristics of their insertion sites. By showing that NUMTs are typically very clean insertions with only minimal deletion or duplication of the surrounding nuclear DNA, these studies have lead to a consensus opinion that most NUMTs are likely inserted as filler DNA via NHEJ (Non-Homologous End Joining). Previous informatics studies have not shed much light upon the preferred insertion sites of NUMTs. Most of them conclude that NUMT insertion is random -- except for contradictory reports that NUMTs correlate positively, or negatively, with retrotransposons. Fortunately, by employing more careful methodology, we were able to discover several as yet undiscovered aspects of this phenomenon. We found that inferred NUMTs insertion sites strongly correlate with predicted physical properties of DNA (curvature and bendability) and A+T rich oligomers. Moreover, recently inserted NUMTs correlate strongly with nucleosome free regions as measured by DNase-seq and FAIRE-seq. We also firmly establishing that NUMTs do indeed tend to co-occur with retrotransposons. As for the source mtDNA which is copied to create NUMTs, we find that part of the mtDNA D-loop region is very seldom copied. Relating these facts to concrete hypotheses regarding the mechanism of NUMT insertion proved very challenging, but also fascinating, as it touched upon diverse topics in molecular biology: from retrotransposon activity and DNA repair to evolutionary conservation of chromatin structure and the packaging of mtDNA.
REFERENCES
Joakim Lundeberg, SciLifeLab, Sweden and The Spruce genome projectSequencing and assembly of the largest and most complex genome to date - the Norway spruce (Picea abies)Conifers are the dominant plant species in many ecosystems, including large areas in Sweden. Despite this, no conifer genome has yet been published, mainly owing to their large size and complexity. The lack of a genome sequence has hampered our understanding of conifer biology and evolution, as well as the development of potential novel breeding strategies of these economically important species. We are currently performing whole genome sequencing and assembly of the 20 Gbp Norway spruce genome. This genome contains huge amounts of repeated elements, with an estimated gene density of only 1/500 kbp. In common with other tree genomes, heterozygozity is high, which further complicates the assembly process. The Spruce Genome Project is addressing questions of genome size, content and evolution, including analyses of gene families and repeats, and will establish Norway spruce as a prime model species for conifer research. In this talk, we will present our main strategies concerning sequencing and assembly of this de novo genome, and give an update on the results obtained so far. In brief, we use a combination of whole genome shotgun and fosmid pool sequencing, followed by scaffolding and merging of the separate assemblies. This is complemented by a manually curated spruce-specific repeat library, sequencing of random fosmid clones for assembly benchmarking, as well as assemblies of the chloroplast and mitochondrial genomes. Lennart Martens, Lennart Martens VIB, Gent, BelgiumSnakes and ladders: where do proteomics assays fail and how can we fix them?Proteomics assays increasingly rely on two distinct and largely independent informatics processing steps: identification and quantification. Both procesing steps can rely on a plethora of available algrotihms and tools, but the maturity of these algorithms is quite distinct. Whereas identification is typically handled by venerable algorithms called search engines, that have been in use for many years, quantification algorithms are still continuously evolving to accommodate the increasing resolution and sensitivity of modern mass spectrometers. Despite this difference in maturity, both steps can be improved. Indeed, the performance of current quantitative workflows can be boosted by simply combining several of them into a single, joint analysis, making the most of the specific sensitivities of each of the algorithms used. On the other hand, the long-serving search engines have also reached crucial limits in terms of specificity, effectively preventing proteomics from reaching a central status in the life sciences. Fortunately, this inherent limitation of current search engines can be fixed by improving the way in which we use the measurements provided by the mass spectrometer. We will here discuss these developments, and highlight how both quantification and identification can be improved; the former by incremental advances, the latter by a more radical change in approach.
Jens Nielsen Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg, SwedenGenome-Scale Metabolic Models: A Bridge between Bioinformatics and Systems BiologyWe are currently working on building a Human Metabolic Atlas, a novel web-based database and modelling tool that can be used by medical and pharmaceutical researchers to analyse clinical data with the objectives of identifying biomarkers associated with disease development and improving health care. The central technology in the Human Metabolic Atlas is so-called genome- scale metabolic modelling (GEMs), which will be made tissue-specific by using different types of experimental data, e.g. from the Human Protein Atlas. These models allow for context-dependent analysis of clinical data, providing much more information than traditional statistical correlation analysis, and hence advance the identification of biomarkers from high-throughput experimental data that can be used for early diagnosis of metabolic related diseases. As part of the Human Metabolic Atlas we are developing GEMs for the gut microbiome. In this context we are using metagenomics for identification of different metabolic functions that are associated with human diseases. Here we are using metagenomics sequencing data from the gut microbiome of patients with different diseases, e.g. arteriosclerosis and type 2 diabetes. Through the combination of the bacterial GEMs and metagenomics data we have identified enriched metabolic functions in the microbiome, and based on this we point to novel prospective biomarkers for disease development. We are further integrating metagenomics information into predictive metabolic models that have the prospect for simulation of how the gut microbiome will respond to diet.
Raymond StevensThe Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037Understanding Human G-protein Coupled Receptor Structural Diversity and ModularityGPCRs constitute one of the largest protein families in the human genome and play essential roles in normal cell processes, most notably in cell signaling. The human GPCR family contains more than 800 members and recognizes thousands of different ligands and activates a number of signaling pathways through interactions with a small number of binding partners. GPCRs have also been implicated in numerous human diseases, and represent more than 40% of drug targets. Delivering GPCR structures in close collaboration with experts on specific receptor systems is of immense value to the basic science community interested in cell signaling and molecular recognition, as well as the applied science community interested in drug discovery. This work is being followed up with additional biophysical characterization including NMR spectroscopy, HDX mass spectrometry, medicinal chemistry and community wide assessments with computational biology groups throughout the world. Crystal structures are now available for rhodopsin, adrenergic, and adenosine receptors in both inactive and activated forms, as well as for chemokine, dopamine, histamine, S1P1, muscarinic, opioid receptors in inactive conformations. A review of the common structural features seen in these receptors will be discussed and the scope of structural diversity of GPCRs at different levels of homology provides insight into our growing understanding of the biology of GPCR action and their impact on drug discovery. Given the current set of GPCR structural data, a distinct modularity is now being observed between the extracellular (ligand-binding) and intracellular (signaling) regions. The rapidly expanding repertoire of GPCR structures provides a solid framework for experimental and molecular modeling studies, and helps to chart a roadmap for comprehensive structural coverage of the whole superfamily and an understanding of GPCR biological and therapeutic mechanisms. The long range goal is to understand GPCR molecular recognition and evolution in relation to human cognition. This work was supported by NIGMS PSI:Biology for GPCR structure processing (U54GM094618) and the NIH Roadmap Initiative (JCIMPT) for technology development (GM073197).
Burkhard Rost, TU MunichEvolution teaches protein predictionThe objective of our group is to predict aspects of protein function from sequence. The only reason why we can pursue such an ambitious goal is the wealth of evolutionary information available through the comparison of the whole bio-diversity of species. Many approaches have benefited substantially from using evolutionary information; for some of these methods learning from evolution made the difference between possible and impossible. In my talk I will present examples of methods that target the prediction of protein interactions, of protein disorder, and of the effect of single residue mutations upon protein structure and function.
Krishnan A, Almén MS, Nordström, KJ, Fredriksson R, Schiöth HB.The origin of GPCRs, the largest family of membrane bound proteinsG protein-coupled receptors (GPCRs) are the largest superfamily among membrane bound proteins. The GPCRs in humans are classified into the five main families named Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin according to the GRAFS classification. Several families of GPCRs show however no apparent sequence similarities to each other, and it has been debated which of them share a common origin. Mining of early vertebrates including lancelet (Branchiostoma floridae) and one of the most primitive animals, the cniderian sea anemones (Nematostella vectensis) provided considerable evidence suggesting that the Adhesion family is ancestral to the peptide hormone binding Secretin family of GPCRs. We also used integrated and independent HHsearch, Needleman-Wunsch-based and motif analyses to determine at the relationship of the other main families. We found strong evidence that the Adhesion and Frizzled families are children to the cyclic AMP (cAMP) family while the large Rhodopsin family is likely a child of the cAMP family. We suggest that the Adhesion and Frizzled families originated from the cAMP family in an event close to that which gave rise to the Rhodopsin family. We also found convincing evidence that the Rhodopsin family is parent to the important sensory families; Taste 2 and Vomeronasal type 1 as well as the Nematode chemoreceptor families. The insect odorant, gustatory, and Trehalose receptors, frequently referred to as GPCRs, form a separate cluster without relationship to the other families, and we propose, based on these and other results, that these families are ligand-gated ion channels rather than GPCRs. We suggest common descent of at least 97% of the GPCRs sequences found in humans. Moreover, we provide the first evidence that four of the five main mammalian families of GPCRs, namely Rhodopsin, Adhesion, Glutamate and Frizzled, are present in Fungi. The unicellular relatives of the Metazoan lineage, Salpingoeca rosetta and Capsaspora owczarzaki have a rich group of both the Adhesion and Glutamate families, which in particular provided insight to the early emergence of the N-terminal domains of the Adhesion family. Further mining of Dictyostelium discoideum suggests that the Glutamate family is as ancient as the cAMP receptor family. Together, these studies clarify the early evolutionary history of the GPCR superfamily and their emergence could be traced back at least 1400 MYA.
Gert Vriend,Radboud University Nijmegen Medical Centre, NeatherlandsWhat can we (not yet) learn from 70 GPCR structures?Headed by the next speaker, the crystallography community has cracked the GPCR crystallisation problem, and the past years we have seen at least one new GPCR structure enter the PDB each month. These structures are in an active state, semi active state, inactive state, or sometimes also an artefactual state. We have been comparing all available structures trying to average out the things done to make the GPCRs crystallize (mutation of crucial residues; adding llama antibodies; adding funny salts and lipids; cloning-in lysozyme). The shear volume of data now allows us to extract the beginning of a coherent story about the activation of GPCRs. Not surprisingly, this story agrees more with basic laws of physics and thermodynamics, and less with the myriads of funny activation schemas that include distict states like R, R*, etc, that have entered the literature over the years.
Martin Weigt, University of Sorbonne, FranceFrom sequence variability to protein (complex) structure predictionMany families of homologous proteins show a remarkable degree of structural and functional conservation, despite their large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach to link this variability (easy to observe) to structure (hard to obtain), i.e. to infer directly co-evolving residue pairs which turn our to form native contacts in the folded protein with high accuracy. The gained information is used to guide tertiary and quaternary structure prediction. As a specific example, I will discuss the auto-phosphorylation complex of histidine kinases, which are involved in the majority of signal transduction systems in the bacteria. Only a multidisciplinary approach integrating statistical genomics, biophysical protein simulation, and mutagenesis experiments, allows us to predict and verify the - so far unknown - active kinase structure.
Eric Westhof Architecture et Réactivité de l’ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire, CNRS, 15 rue René Descartes, 67084 Strasbourg, FranceThe Detection of the Architectural Modules of RNA and Recent Progress in RNA ModellingRNA architecture can be viewed as the hierarchical assembly of preformed doublestranded helices defined by Watson-Crick base pairs and RNA modules maintained by non-Watson-Crick base pairs. RNA modules are recurrent ensemble of ordered nonWatson-Crick base pairs. Such RNA modules constitute a signal for detecting noncoding RNAs with specific biological functions. It is, therefore, important to be able to recognize such genomic elements within genomes. Through systematic comparisons between homologous sequences and x-ray structures, followed by automatic clustering, the whole range of sequence diversity in recurrent RNA modules has been characterized. These data permitted the construction of a computational pipeline for identifying known 3D structural modules in single and multiple RNA sequences in the absence of any other information. Any module can in principle be searched, but four can be searched automatically: the G-bulged loop, the Kink-turn, the C-loop and the tandem GA loop. The present pipeline can be used for RNA 2D structure refinement, 3D model assembly, and for searching and annotating structured RNAs in genomic data. Following the recent dramatic advances in tools aimed at RNA 3D modelling, a first, collective, blind experiment in RNA three-dimensional structure prediction has been performed. The goals are to assess the leading edge of RNA structure prediction techniques, compare existing methods and tools, and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in their efforts to improve their tools.
Roman A. Zubarev Division of Physiological Chemistry I, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheeles väg 2, S-171 77 Stockholm, SwedenPathway Analysis in Expression ProteomicsProteomics studies have revealed unexpected plasticity and dynamic nature of the human proteome. The paradigm that the time evolution of a biological system can be described by abundance variation of relatively few “regulated” proteins has been shuttered, being replaced by the growing understanding that the whole proteome is regulated, and virtually no protein remains unaffected when the system undergoes transition from one state to another. This finding underlines the importance of systems biology analysis of expression proteomics data. Systems biology shifts the analytical focus from thousands of proteins to hundreds of signaling pathways, thus reducing the number of entities to be analyzed. Application of these methods required the development of novel systems biology tools, such as the pathway search engine (PSE [1-3]). These tools can only be effective when they are quantitative, i.e. predict not only the activated pathway, but also the relative degree of its activation. Introducing the quantitative aspect in systems biology is one of the greatest challenges this field is facing today, since the final goal of pathway analysis, which is the creation of a quantitative predicting model of the biological process under investigation.
Posters |