In the past, construction of quantitative metabolic flux
models has been an extremely time-consuming process, requiring 12-18 months to
create a bacterial model. One of our
main goals in designing the MetaFlux module for creating metabolic models
within Pathway Tools has been to speed up this process by automating as many of
its steps as possible, and by providing software power tools for debugging
metabolic models (a viewpoint that was put forward by our colleague Jeremy
Zucker). We can now create metabolic
models using MetaFlux in approximately 1 month.
This blog surveys our recommended procedure for creating
metabolic models from sequenced genomes using Pathway Tools.
We assume that the starting point for model construction is
an annotated genome for the organism of interest (see our earlier
blog post for an overview of metabolic modeling and of how the modeling
process can be used to validate an annotated genome). Model construction proceeds in two phases:
qualitative reconstruction of the metabolic network, and development of the
quantitative metabolic model.
Phase I: Qualitative Metabolic Network Reconstruction
Phase I is performed by the PathoLogic module of Pathway
Tools. It consists of the following
steps:
1.
Reactome Inference: Given the set of proteins
within an annotated genome, reactome inference computes the set of biochemical
reactions catalyzed by the metabolic enzymes within that protein set. PathoLogic considers the annotations
available for each protein: its Gene Ontology (GO) terms, EC number, and enzyme
name are all queried within the MetaCyc database, and the metabolic reactions
corresponding to those queries are retrieved and inserted into the
Pathway/Genome Database (PGDB) for the organism. Although the controlled vocabularies used by
GO and the EC number system are preferred for reactome inference because of the
precision with which they identify reactions, many genomes lack them. Searches based on enzyme name are therefore
critical; our enzyme name matcher attempts to match many syntactic variants of
the enzyme names (such as removing various prefixes and suffixes that can
obscure the catalytic activity described by the name) against the extensive
enzyme synonym lists stored in MetaCyc.
2.
Metabolic Pathway Inference: PathoLogic next
infers metabolic pathways in the organism.
Pathways in MetaCyc are scored as to their likely presence in the
organism based on features such as the number of reactions from the pathway
present in the organism’s reactome, and on whether the pathway is expected to
occur in the organism based on the organism’s taxonomic group. Note that when a pathway is inferred as
present, PathoLogic creates in the organism PGDB not only the pathway, but all
reactions that are present in the pathway in MetaCyc but were not previously
inferred in the reactome of the organism.
Thus, pathway inference fills significant numbers of missing (gap)
reactions.
3.
Pathway Hole Filling: The PathoLogic pathway
hole filler searches the genome using sequence analysis for enzymes that
catalyze those reactions that have no identified enzymes in the genome (those
reactions are called pathway holes).
A last manual stage of metabolic reconstruction is typically
performed by the user, namely to review the pathways predicted by PathoLogic to
search for false-positive pathway predictions (predicted pathways that are
thought to in fact be absent from the organism), and for pathways that should
have been predicted by PathoLogic but were not.
In addition the user reviews enzyme activities that were not identified
by PathoLogic and enters the associated metabolic reactions manually.
Phase II: Quantitative Metabolic Model Development
Steady-state metabolic models consist of a large system of
linear equations derived from the reaction network of the organism as stored in
a PGDB. The system of equations is fed
to a linear optimization program, with the objective of determining assignments
of steady-state fluxes (rates) to reactions that optimize the production of
cellular biomass. MetaFlux uses a linear
solver called SCIP that is fast and free to academic institutions.
What happens if no path through the metabolic network exists
from the supplied nutrients to every biomass metabolite (biosynthetic product)
specified for the model? In this case
the solver cannot find a solution: its output simply says no solution, with no
explanation of why. Such network gaps
are common because of unidentified gene functions in genome annotations. And they are a serious problem: How do you
identify the handful of missing reactions among 1,000 reactions? Or, what if a necessary nutrient has been
omitted from the starting conditions of the model? This condition will also prevent model
growth. Another impediment to model
growth is that a secreted compound has been omitted – missing secreted
compounds can block reactions in the model from carrying flux, preventing the
model from solving. The difficulty of
solving these problems is one reason it can take more than a year to develop a
metabolic model.
One approach to these issues is to make model building more
tractable by focusing on one small piece of the model at a time. Begin with a few core pathways, solve a model
for those pathways, then add a few more pathways and solve that larger
model. At each step any problems with
the model must be confined to the new increment of reactions that were just
added; that locality makes problems easier to find. But this process is still quite time
consuming.
A second approach is to build software tools to aid us in
debugging the model, namely the development mode of MetaFlux. Development mode (also called the gap filler)
solves an optimization problem to determine how to transform an unsolvable
model to a solvable model. That
optimization problem seeks a minimal-cost set of modifications to the model
that will render the model solvable, as follows:
·
Find a minimal number of additional reactions
from MetaCyc to add to the model
·
Find a minimal number of reactions in the model
whose direction should be reversed
·
Find a minimal number of nutrients from a
user-specified “try-nutrients” set to add to the model
·
Find a minimal number of secreted compounds from
a user-specified “try-secretions” set to add to the model
During development of our human metabolic model the gap
filler suggested the addition of eight new reactions to the model; we
researched these reactions in the literature and found that four of them were
known to occur in humans, but had been overlooked by our earlier curation
efforts.
In some cases, the gap filler is unable to compute a set of
transformations to render a model solvable.
In this case it will tell the user which biomass metabolites can be
produced by the model and which cannot be produced so that the user can focus
their development efforts on the production of those missing metabolites. Furthermore, MetaFlux will produce a “blocked
reaction report” that identifies, for each biomass metabolite that cannot be
produced, what is the chain of blocked reactions that prevent its production.
Learn More about MetaFlux
MetaFlux and the full Pathway Tools
software are freely available from SRI for academic use. You can learn more about how to use MetaFlux
by attending SRI’s metabolic modeling tutorials (the next
tutorial is scheduled for March 18-19, 2015 in
Menlo Park, CA), and by reading the Pathway Tools User’s Guide.
No comments:
Post a Comment