This post describes a way to gather the identifiers
associated with a gene, which are stored under several different object properties
in BioCyc (in some cases referred to as slots).
These identifiers are useful for verifying the identity of gene
references between EcoCyc and other gene databases and catalogs. Use of these identifiers is more reliable
than depending on the gene names.
BioCyc PGDBs store identifiers for genes in several
places. These identifiers include the
PGDB’s own BioCyc identifier, unification links to other databases (stored as PGDB
database links), and locus tags from the Genbank entry for the genome. These additional identifiers are stored as
properties (slots) of the gene frame called Accession-1 and Accession-2. Different PGDBs may assign different sets of
identifiers to these slots, but using these slots allows a consistent way to
access these. In this post, I’ll discuss
how to use SmartTables to build a list of identifiers associated with a set of
genes. I’ll use EcoCyc as an example
PGDB, both because it uses both accession slots, and because the demonstration
doesn’t require an active subscription.
Since it uses SmartTables, you will need a free BioCyc account to follow
this demonstration. Here’s a screenshot
of the final table. The final, full table is also linked here.
This is the step-by-step procedure.
1.
Go to EcoCyc.org and login. If you are already logged in to BioCyc, change
your organism to E. coli K12. Substr MG1655.
2.
Now open the ‘Smart Tables’ menu and choose the
‘Special Smart Tables’ command.
3.
This will take you to a page with a list of
special smart tables corresponding to many types of entities a BioCyc user may
find useful, such as all compounds, genes, or enzymes in E coli. Click on the “All genes of E. coli...”, which
will be the second row in the list.
4.
Since you are logged in, this will create an
editable copy of the special SmartTable which lists all the genes in E. coli, including the Gene’s name, and the Accession-1
property, as well the left and right boundaries in the genome and the gene’s
product. However, as I mentioned, there
are additional properties with alternative identifiers.
5.
Above the table, there are three drop-down
boxes. The middle one is labeled ‘Add
Property Column’. Additional gene
identifiers are available in two columns: Object ID and Accession-2. Add an Object ID column by clicking on the
drop down list and selecting the column by name. The column will appear at the far right. This is EcoCyc’s own internal
identifier. Repeat the process for
Accession-2. The Accession-1 and
Accession-2 identifiers for E. coli are locus tags from two different naming
systems. As I mentioned, the particular
identifiers used will be different in different PGDBs.
6.
You can also add identifiers from one or more
external databases in the same way. Use
the Add Property Column dropdown and choose ‘Database Links’. Now a window with a list of external
databases appears. You can select one,
or several while holding the appropriate key (control or command on Macs). In this example, I selected the EchoBASE
database because it has alternate identifiers for many, though not all the
genes in EcoCyc. Click the ‘Go’ button
to add the column(s).
7.
Once you have the columns, you can use the right-sidebar
Operations Menu on the right-side bar to export the Smart Table to a file. However, if you’re not interested in saving
the coordinates or gene product columns, you can delete those columns by selecting
them (click in the colored space immediately above the column name), then choosing
the ‘delete column’ command, which appears in both the delete and column
submenus in the Operations Menu.
I hope you have found this discussion useful and I welcome
questions and comments.
No comments:
Post a Comment