Friday, January 30, 2015

Metabolic Modeling for Validation of Genome Annotations

A major advance in bioinformatics in the last decade is the rapidity with which we can now create quantitative metabolic models from sequenced genomes.  In this and future blog posts we will examine several applications of metabolic modeling.  This post introduces metabolic modeling, considers its use for validation of genome annotations, and proposes that construction of metabolic models can form a routine part of the genome annotation process.

Introduction to Steady-State Metabolic Modeling

In these blog posts we describe steady-state metabolic models (as opposed to kinetic models).  Steady-state models describe a cell whose metabolic machinery is at equilibrium, steadily churning out energy and the end products of biosynthesis from nutrients that the cell takes in at steady rates.  At steady state, the fluxes that produce each cellular metabolite equal the fluxes that consume each metabolite.  The fluxes are balanced, hence the name for a major modeling technique in this field: flux-balance analysis.

Unlike the other main approach to metabolic modeling – kinetic modeling – steady-state models do not predict how the metabolic state of the cell changes over time.  That drawback is counter-balanced by the fact that steady-state models are orders of magnitude easier to create than kinetic models because steady-state models do not require the large number of difficult-to-measure quantitative parameters that kinetic models require.  Thus it is practical to create steady-state models at the genome scale, based on genome annotations.

Steady-state metabolic flux models have five components.  Examples of each can be found in our recent paper on EcoCyc as an E. coli metabolic model [1]:
1.     The set of nutrients available as inputs to the metabolism.  These include one or more sources of carbon, nitrogen, phosphorus, and sulfur.  Our E. coli model uses 14 nutrients.
2.     The set of metabolites created as end products of metabolism, called the biomass metabolites.  These include amino acids, nucleotides, lipids, polysaccharides, and other cell constituents; our E. coli model produces 83 biomass metabolites.  The relative molar mass of each biomass metabolite can be provided to model the cell composition in detail.
3.     The set of waste products secreted by the cellular metabolism.  Examples include carbon dioxide, methane, hydrogen gas, excess water and protons, and fermentation products such as acetate and ethanol.
4.     The set of reactions constituting the metabolic network. Our genome-scale model of E. coli covers more than 2000 reactions.
5.     An optional set of constraints on the fluxes within the networks, such as constraints on the uptake rates of different nutrients, and limits on the flux rates of reactions within the network.

Given the preceding inputs, a steady-state metabolic model predicts the steady-state specific flux rates (the number of moles of the reaction products created per gram of cells per second) of every reaction within the network.  For many of those reactions, their flux rate will be zero because they are not used by the cell during growth on the specified set of nutrients.  For simulations of E. coli growth on glucose under aerobic conditions, only about 20% of the reactions in the model carry  flux [1].

Validation of Genome Annotation and of Genome-Based Metabolic Reconstruction

Metabolic models can be developed to varying levels of accuracy and validation.  The simplest level of validation underlies our use case here, namely verifying that the model can produce all biomass metabolites from the input nutrients, or put another way, demonstrating that “the model can grow.”  In our experience, metabolic models never exhibit growth the first time they are run, just as computer programs rarely work the first time they are run.  The most frequent reason that models fail to grow is because they are incomplete – they lack one or more critical metabolic reactions.  That incompleteness is typically due to incompleteness in the genome annotation.  Genes whose function was not predicted at all – or were predicted incorrectly during sequence analysis – lead to missing reactions in the metabolic network, referred to as network gaps.

Not all gaps prevent model growth, because the cell can circumvent some gaps using other routes through the metabolic network.  But a gap that prevents the production by the network of any one biomass metabolite will prevent model growth.  Thus, model growth becomes a test for the validity of the genome annotation, and of the metabolic reconstruction (the metabolic reaction set) derived from the genome annotation by software such as SRI’s Pathway Tools. 

Identifying the missing reactions in a metabolic network is quite a difficult problem when approached manually.  Therefore, the MetaFlux modeling tool within Pathway Tools provides a gap filler module that automatically suggests what reactions are missing.  More precisely, the gap filler computes a minimal set of reactions from our MetaCyc database that, if added to the organism’s metabolic model, will enable growth of the model (meaning production of all biomass metabolites).   Given the suggested set of reaction gap fillers, you can use Pathway Tools’ Pathway Hole Filler module to search for genes within the genome than may code for the enzymes catalyzing those reactions, or you can use your own sequence-analysis methodology to search for those enzymes. 

In our experience, even highly curated genome annotations contain network gaps that can be identified using metabolic modeling.  A metabolic model that simply shows growth under a given nutrient set is still at an early stage of development and probably requires further development before it will produce accurate quantitative predictions (such as predicting the growth rate accurately).  However, this approach is likely to improve the quality of genome annotations, and would set a new bar for publications on completely sequenced genomes.

Learn More about MetaFlux

MetaFlux and the full Pathway Tools software are freely available from SRI for academic use.  You can learn more about how to use MetaCyc by attending SRI’s metabolic modeling tutorials (the next tutorial is scheduled for March 18-19, 2015 in Menlo Park, CA), and by reading the Pathway Tools User’s Guide.

[1]  Weaver DS, Keseler IM, Mackie A, Paulsen IT, Karp PD. A genome-scale metabolic flux model of Escherichia coli K-12 derived from the EcoCyc database.  BMC Syst Biol. 2014 Jun 30;8:79. doi: 10.1186/1752-0509-8-79.

1 comment:

  1. I really like examining and also following ones write-up when i locate them incredibly beneficial and also fascinating.
    That write-up is usually just as beneficial along with fascinating.Verification and Validation both are independent type of testing. Obviously,
    If we look both of these activities as a whole, we can also call it testing.

    software validation