Friday, March 6, 2015

Procedure for Creating Metabolic Models from Sequenced Genomes




In the past, construction of quantitative metabolic flux models has been an extremely time-consuming process, requiring 12-18 months to create a bacterial model.  One of our main goals in designing the MetaFlux module for creating metabolic models within Pathway Tools has been to speed up this process by automating as many of its steps as possible, and by providing software power tools for debugging metabolic models (a viewpoint that was put forward by our colleague Jeremy Zucker).  We can now create metabolic models using MetaFlux in approximately 1 month.
 This blog surveys our recommended procedure for creating metabolic models from sequenced genomes using Pathway Tools.  


 We assume that the starting point for model construction is an annotated genome for the organism of interest (see our earlier blog post for an overview of metabolic modeling and of how the modeling process can be used to validate an annotated genome).  Model construction proceeds in two phases: qualitative reconstruction of the metabolic network, and development of the quantitative metabolic model.


Phase I: Qualitative Metabolic Network Reconstruction


Phase I is performed by the PathoLogic module of Pathway Tools.  It consists of the following steps:

1.     Reactome Inference: Given the set of proteins within an annotated genome, reactome inference computes the set of biochemical reactions catalyzed by the metabolic enzymes within that protein set.  PathoLogic considers the annotations available for each protein: its Gene Ontology (GO) terms, EC number, and enzyme name are all queried within the MetaCyc database, and the metabolic reactions corresponding to those queries are retrieved and inserted into the Pathway/Genome Database (PGDB) for the organism.  Although the controlled vocabularies used by GO and the EC number system are preferred for reactome inference because of the precision with which they identify reactions, many genomes lack them.  Searches based on enzyme name are therefore critical; our enzyme name matcher attempts to match many syntactic variants of the enzyme names (such as removing various prefixes and suffixes that can obscure the catalytic activity described by the name) against the extensive enzyme synonym lists stored in MetaCyc.

2.     Metabolic Pathway Inference: PathoLogic next infers metabolic pathways in the organism.  Pathways in MetaCyc are scored as to their likely presence in the organism based on features such as the number of reactions from the pathway present in the organism’s reactome, and on whether the pathway is expected to occur in the organism based on the organism’s taxonomic group.  Note that when a pathway is inferred as present, PathoLogic creates in the organism PGDB not only the pathway, but all reactions that are present in the pathway in MetaCyc but were not previously inferred in the reactome of the organism.  Thus, pathway inference fills significant numbers of missing (gap) reactions.

3.     Pathway Hole Filling: The PathoLogic pathway hole filler searches the genome using sequence analysis for enzymes that catalyze those reactions that have no identified enzymes in the genome (those reactions are called pathway holes).

A last manual stage of metabolic reconstruction is typically performed by the user, namely to review the pathways predicted by PathoLogic to search for false-positive pathway predictions (predicted pathways that are thought to in fact be absent from the organism), and for pathways that should have been predicted by PathoLogic but were not.  In addition the user reviews enzyme activities that were not identified by PathoLogic and enters the associated metabolic reactions manually.

Phase II: Quantitative Metabolic Model Development


Steady-state metabolic models consist of a large system of linear equations derived from the reaction network of the organism as stored in a PGDB.  The system of equations is fed to a linear optimization program, with the objective of determining assignments of steady-state fluxes (rates) to reactions that optimize the production of cellular biomass.  MetaFlux uses a linear solver called SCIP that is fast and free to academic institutions.
What happens if no path through the metabolic network exists from the supplied nutrients to every biomass metabolite (biosynthetic product) specified for the model?  In this case the solver cannot find a solution: its output simply says no solution, with no explanation of why.  Such network gaps are common because of unidentified gene functions in genome annotations.  And they are a serious problem: How do you identify the handful of missing reactions among 1,000 reactions?  Or, what if a necessary nutrient has been omitted from the starting conditions of the model?  This condition will also prevent model growth.  Another impediment to model growth is that a secreted compound has been omitted – missing secreted compounds can block reactions in the model from carrying flux, preventing the model from solving.  The difficulty of solving these problems is one reason it can take more than a year to develop a metabolic model.

One approach to these issues is to make model building more tractable by focusing on one small piece of the model at a time.  Begin with a few core pathways, solve a model for those pathways, then add a few more pathways and solve that larger model.  At each step any problems with the model must be confined to the new increment of reactions that were just added; that locality makes problems easier to find.  But this process is still quite time consuming. 

A second approach is to build software tools to aid us in debugging the model, namely the development mode of MetaFlux.  Development mode (also called the gap filler) solves an optimization problem to determine how to transform an unsolvable model to a solvable model.  That optimization problem seeks a minimal-cost set of modifications to the model that will render the model solvable, as follows:

·      Find a minimal number of additional reactions from MetaCyc to add to the model
·      Find a minimal number of reactions in the model whose direction should be reversed
·      Find a minimal number of nutrients from a user-specified “try-nutrients” set to add to the model
·      Find a minimal number of secreted compounds from a user-specified “try-secretions” set to add to the model

During development of our human metabolic model the gap filler suggested the addition of eight new reactions to the model; we researched these reactions in the literature and found that four of them were known to occur in humans, but had been overlooked by our earlier curation efforts.
 In some cases, the gap filler is unable to compute a set of transformations to render a model solvable.  In this case it will tell the user which biomass metabolites can be produced by the model and which cannot be produced so that the user can focus their development efforts on the production of those missing metabolites.  Furthermore, MetaFlux will produce a “blocked reaction report” that identifies, for each biomass metabolite that cannot be produced, what is the chain of blocked reactions that prevent its production.

Learn More about MetaFlux

 MetaFlux and the full Pathway Tools software are freely available from SRI for academic use.  You can learn more about how to use MetaFlux by attending SRI’s metabolic modeling tutorials (the next tutorial is scheduled for March 18-19, 2015 in Menlo Park, CA), and by reading the Pathway Tools User’s Guide.


No comments:

Post a Comment