Wednesday, July 13, 2016

PythonCyc: Using the Pathway Tools Python API

Pathway Tools is implemented using the Common Lisp (CL) programming language, but the PythonCyc package creates a bridge between Python and CL. That is, the PythonCyc package allows you to interact with Pathway Tools using the Python language. With PythonCyc you can write Python programs to execute Pathway Tools metabolic models, as well asto extract and modify data stored in Pathway/Genome Databases (PGDBs). It is also possible to call from Python many functions defined in Pathway Tools that manipulate genes, pathways, reactions, proteins, and more.

Wednesday, June 8, 2016

Bulk Updates to Your PGDB

One question that we frequently receive is about how to apply bulk updates to a PGDB. This kind of situation can come about for several reasons:
  • When a group maintains and curates organism data on an ongoing basis using their own software or database environment, and then wants to update a PGDB with all their changes in a single batch operation.
  • When a revised annotation for an organism is made available, and a user wishes to update their PGDB with the new data without losing any existing curation.
  • When a user has some systematic change that they want to apply to large number of objects, such as a change to the locus id format, the addition of a new set of synonyms, or adding links to a new external database.
  • When a user wants to import a large dataset obtained via a high-throughput experiment or computational prediction, such as for protein cellular location or transcription factor binding sites.
Because these are all common scenarios, it seems worthwhile to provide an overview of the various ways that Pathway Tools supports bulk updating of PGDBs.  Note that none of the features discussed here are particularly new, and all have been supported by Pathway Tools for several years.  All User Guide section numbers referenced below are for version 20.0.

It should first be noted that Pathway Tools comes with a full suite of editing and curation tools, so if you have only a handful of changes to make, you should use those to make the edits interactively. The techniques described in this article would normally only be used if you have so many updates that it would be tedious to make the edits manually. 

Wednesday, April 13, 2016

BioCyc to Adopt Subscription Model

BioCyc seeks the support of the scientific community as we begin a new chapter in the development of this bioinformatics resource.

We plan to upgrade the curation level and quality of many BioCyc databases to provide scientists with higher quality information resources for many important microbes, and forHomo sapiens. Such an effort requires large financial resources that -- despite numerous attempts over numerous years -- have not been forthcoming from government funding agencies. Thus, we plan to transition BioCyc to a community-supported non-profit subscription model in the coming months.

Our Goal

Our goal at BioCyc is to provide scientists with the highest quality microbial genome and metabolic pathway web portal in the world by coupling unique and high-quality database content with powerful and user-friendly bioinformatics tools. Our work on EcoCyc has demonstrated the way forward. EcoCyc is an incredibly rich and detailed information resource whose contents have been derived from 30,000 E. coli publications. EcoCyc is an online electronic encyclopedia, a highly structured queryable database, a bioinformatics platform for omics data analysis, and an executable metabolic model. EcoCyc is highly used by the life-sciences community, demonstrating the need and value of such a resource.

Our goal is to develop similar high-quality databases for other organisms. BioCyc now contains 7,600 databases, but only 42 of them have undergone any literature-based curation, and that curation occurs irregularly. Although bioinformatics algorithms have undergone amazing advances in the past two decades, their accuracy is still limited, and no bioinformatics inference algorithms exist for many types of biological information. The experimental literature contains vast troves of valuable information, and despite advances in text mining algorithms, curation by experienced biologists is the only way to accurately extract that information. EcoCyc curators extract a wide range of information on protein function; on metabolic pathways; and on regulation at the transcriptional, translational, and post-translational levels.

In the past year SRI has performed significant curation on the BioCyc databases forSaccharomyces cerevisiae, Bacillus subtilis, Mycobacterium tuberculosis, Clostridium difficile, and (to be released shortly) Corynebacterium glutamicum. All told, BioCyc databases have been curated from 66,000 publications, and constitute a unique resource in the microbial informatics landscape. Yet much more information remains untapped in the biomedical literature, and new information is published at a rapid pace. That information can be extracted only by professional curators who understand both the biology, and the methods for encoding that biology in structured databases. Without adequate financial resources, we cannot hire these curators, whose efforts are needed on an ongoing basis.

Why Do We Seek Financial Support from the Scientific Community?

The EcoCyc project has been fortunate to receive government funding for its development since 1992. Similar government-supported databases exist for a handful of biomedical model organisms, such as fly, yeast, worm, and zebrafish. Peter Karp has been advocating that the government fund similar efforts for other important microbes for the past twenty years, such as for pathogens, biotechnology workhorses, model organisms, and synthetic-biology chassis for biofuels development. He has developed the Pathway Tools software as a software platform to enable the development of curated EcoCyc-like databases for other organisms, and the software has been used by many groups. However, not only has government support for databases not kept pace with the relentless increases in experimental data generation, but the government is funding few new databases, is cutting funding for some existing databases (such as for EcoCyc, for BioCyc, and for TAIR), and is encouraging the development of other funding models for supporting databases [1]. Funding for BioCyc was cut by 27% at our last renewal whereas we are managing five times the number of genomes as five years ago. We also find that even when government agencies want to support databases, review panels score database proposals with low enthusiasm and misunderstanding, despite the obvious demand for high-quality databases by the scientific community.

Put another way: the Haemophilus influenzae genome was sequenced in 1995. Now, twenty years later, no curated database that is updated on an ongoing basis exists for this important human pathogen. Mycobacterium tuberculosis was sequenced in 1998, and now, eighteen years later, no comprehensive curated database exists for the genes, metabolism, and regulatory network of this killer of 1.5 million human beings per year. No curated database exists for the important gram-positive model organism Bacillus subtilis. How much longer shall we wait for modern resources that integrate the titanic amounts of information available about critical microbes with powerful bioinformatics tools to turbocharge life-science research?

How it Will Work and How You Can Support BioCyc

The tradition whereby scientific journals receive financial support from scientists in the form of subscriptions is a long one. We are now turning to a similar model to support the curation and operation of BioCyc. We seek individual and institutional subscriptions from those who receive the most value from BioCyc, and who are best positioned to direct its future evolution. We have developed a subscription-pricing model that is on par with journal pricing, although we find that many of our users consult BioCyc on a daily basis -- more frequently than they consult most journals. We hope that this subscription model will allow us to raise more funds, more sustainably, than is possible through government grants, through our wide user base in academic, corporate, and government institutions around the world. We will also be exploring other possible revenue sources, and additional ways of partnering with the scientific community.

BioCyc is collaborating with Phoenix Bioinformatics to develop our community-supported subscription model. Phoenix is a nonprofit that already manages community financial support for the TAIR Arabidopsis database, which was previously funded by the NSF and is now fully supported [2] by users. Phoenix Bioinformatics will collect BioCyc subscriptions on behalf of SRI International, which like Phoenix is a non-profit institution. Subscription revenues will be invested into curation, operation, and marketing of the BioCyc resource.
We plan to go slow with this transition to give our users time to adapt. We’ll begin requiring subscriptions for access to BioCyc databases other than EcoCyc and MetaCyc starting in July 2016.

Access to the EcoCyc and MetaCyc databases will remain free for now. Subscriptions to the other 7,600 BioCyc databases will be available to institutions (e.g., libraries), and to individuals. One subscription will grant access to all of BioCyc. To encourage your institutional library to sign up, please contact your science librarian and let him or her know that continued access to BioCyc is important for your research and/or teaching.
Subscription prices will be based on website usage levels and we hope to keep them affordable so that everyone who needs these databases will still be able to access them. We are finalizing the academic library and individual prices and will follow up soon with more information including details on how to sign up. We will make provisions to ensure that underprivileged scientists and students in third-world countries aren’t locked out.

Please spread the word to your colleagues -- the more groups who subscribe, the better quality resource we can build for the scientific community.

Thursday, March 10, 2016

Introducing Pathway Collages...

Figure 1
Pathway Tools has long been recognized for the quality of our automatically generated individual metabolic pathway diagrams, which are intuitive to biologists, can be shown at varying levels of detail, and can be customized in various ways, including with the overlay of omics data. When a more global view is called for, our cellular overview diagram depicts the entire metabolic network for an organism, with capabilities for selective highlighting and overlay of omics data. However, to understand some biochemical situations, viewing a single pathway is insufficient, whereas viewing the entire metabolic network results in information overload. Pathway Collages, new in Pathway Tools version 19.5, are an attempt to bridge this gap, allowing users to create high-quality, customized, user-manipulable diagrams containing collections of user-specified pathways.

Pathway Collages can be explored and edited via the Pathway Collage Viewer web browser application. This application, implemented using the Cytoscape.js open-source JavaScript graph visualization library, supports panning, zooming, and all the editing and customization operations described in this post and the documentation embedded within the Pathway Collage Viewer itself. Feel free to experiment yourself with the example pathway collage online at http://biocyc.org/cytoscape-js/ovsubset.html?graph=example1&showHelp=T, or create your own following the instructions below.

Figure 2
Three example Pathway Collage figures are illustrated here. Figure 1 depicts a Pathway Collage consisting of four E. coli pathways overlaid with gene expression data. This diagram has already been manually adjusted by repositioning the pathways relative to each other and tweaking node font sizes and shapes. Metabolites that are shared between pathways are indicated by drawing connecting lines between them. 

Figure 2 shows a collage consisting of two E. coli pathways overlaid with predicted reaction flux data. In this diagram, rather than drawing connecting lines, compounds that are shared between the two pathways are merged, showing glycolysis flowing seamlessly into fermentation.
Figure 3

Figure 3 depicts a collage containing a larger number of pathways at a lower zoom level, so metabolite, enzyme and gene names are automatically suppressed (the font size of the pathway labels has been increased so those labels remain visible). In addition to manually repositioning pathways, merging some common nodes, and changing the default colors, some metabolites of interest have been highlighted in purple.

Now that you've seen what you can do with a Pathway Collage, how can you create one for yourself? Pathway Collages can be created from either the BioCyc website (or other Pathway Tools-based website) or from desktop Pathway Tools. There are five basic steps.
  1. Specify the set of pathways to be included. The simplest and most reliable way to specify a set of pathways is to generate a SmartTable containing the desired pathways, and then export the SmartTable to a Pathway Collage. This works both for the desktop and web versions of Pathway Tools, and enables you to keep your list of pathways around in case you ever want to edit it or regenerate your collage. There are other ways to specify a set of pathways, such as by interactively clicking on them in the cellular overview diagram (desktop only), from an omics dataset (web only), or by creating a seed collage from a single pathway and then interactively adding more (web only). We may add additional options to specify pathways in the future. Consult the documentation for more details.
  2. Export to Pathway Collage Viewer. Pathway Tools will compute automatic layouts of the individual pathways within the collage, then position those diagrams next to one another horizontally, and send that initial layout of the collage to the Pathway Collage Viewer application in your web browser.
  3. Interactively refine and customize the collage. This can involve repositioning items, showing connections, adding, deleting or merging elements, editing labels, highlighting elements of interest, and/or customizing node and edge styles. By default, only the metabolites along the main backbone of a pathway are included in the diagrams, but side metabolites can be added interactively. Additional pathways involving a metabolite of interest can also be added interactively.
  4. Import omics data to be visualized on the collage (optional). Omics data can be added either before or after the collage is generated. The collage can display omics data associated with either genes, metabolites, or reactions. When multi-timepoint gene expression data is displayed, the display of enzyme names is suppressed.
  5. Save or export the collage. At any time, a pathway collage can be saved as a JSON-format graph file on your computer; that file can later be loaded back in to the collage viewer (not all browsers support this operation --- we recommend using Chrome or Firefox). A pathway collage can also be exported to a PNG-format image file for use in presentations or publications. The image will be generated with a resolution comparable to that of the display at the time the image is created (up to some maximum), therefore, the highest-quality images are obtained if the collage is displayed at a high zoom level when exporting.
For more information on Pathway Collages, see the Pathway Tools Website User Guide or the help documentation within the Pathway Collage Viewer itself.