In most annotations, about half of the ORFs in Synechocystis sp. PCC 6803 codes for "unknown" proteins, and for the known proteins relatively little is understood regarding their regulation and activity in vivo. Bioinformatics approaches are being used to adequately predict the function of genes; such predictions can be tested experimentally. In this way, the number of ORFs without an assigned function has been reduced drastically (Table 1).

Metabolic reconstruction in Synechocystis sp. PCC 6803

  • A large number of microbial genomes was compared in ERGO ( to obtain a metabolic reconstruction draft of Synechocystis sp. PCC 6803.
  • A complete draft has been obtained for central carbon metabolism, biogenesis of photosystems I and II, NAD biosynthesis, carotenoid biosynthesis, and membrane transport.
  • In-depth curation of respiration, biosynthesis of photosynthetic pigments, amino acids, vitamins/cofactors, and other metabolic systems is under way.

Toward mathematical modeling of cell metabolism

     The metabolic reconstruction (above) is a first step towards mathematical modeling of cellular metabolism. Integrated Genomics has developed software tools to use the metabolic reconstruction to develop a reaction network suitable for modeling. Common problems that we are trying to resolve include: 1. redundancy and imprecision of compound names (synonyms and typos); 2. inconsistent formula representation (such as salt versus free acid); 3. lack of convention in treating generic names (such as alcohol); 4. non-balanced reactions (lost reactants, wrong stoichiometry, etc); 5. undefined direction/reversibility; 6. abundance of irrelevant reactions that are not supported by reconstruction; 7. lost relevant reactions that are supported experimentally or by reconstruction; 8. inconsistency in enzyme names, function and EC numbers; 9. inconsistency in coding of complex reactions (polymerization, degradation, etc).

     We are now building a complete and curated collection of compounds, reactions and functional roles relevant for Synechocystis sp. PCC 6803. Our curation effort, necessary for network construction and modeling, includes: 1. extraction from ERGO of subsets of enzymes, reactions, and compounds covering selected sections of Synechocystis sp. PCC 6803 metabolism; 2. manual curation of compounds, adding, reconciling and refining names/synonyms, atomic formulas, and structure representations; 3. manual curation of reactions in terms of mass balance, reversibility/direction, assignment of EC numbers, and hierarchical placement; 4. manual curation of functional roles of enzymes and genes (names and EC numbers, annotation, and identification of missing genes).

     Figure 1 illustrates the web-based user interface designed and implemented at IG to support this curation effort. Using the strategy of system-by-system curation, we began our curation with large well-studied systems, such as central carbon metabolism, amino acids biosynthesis, and nucleotide and cofactor biosynthesis in Synechocystis sp. PCC 6803.

Functional predictions

     About 1/3 of the ORFs in the Synechocystis sp. PCC 6803 genome do not have an assigned function (Table 1). We initiated a large-scale effort to predict potential functions for these ORFs based on a combination of comparative genomics techniques (gene coupling on the chromosome, phylogenetic signatures, shared regulatory sites, and protein fusion events (Galperin and Koonin 2000; Huynen et al. 2000; Marcotte et al. 1999; Mellor et al. 2002; Osterman and Overbeek 2003). Software was developed to determine the presence and strength of functional indications through functional coupling, co-occurrence of an ORF with other genes in various genomes, and gene fusion events for each hypothetical ORF.

     Hypothetical ORFs that show strong connections with functionally characterized genes in various microbial genomes have been analyzed further (exemplified in Figure 2 and Figure 3). Specific, testable functional predictions may then be experimentally verified and incorporated into metabolic reconstruction of Synechocystis sp. PCC 6803.

Predictions based on gene clustering: example of lycopene cyclase

     Synechocystis sp. PCC 6803 carotenoid biosynthesis pathway reconstruction showed "missing genes" (functions not connected to any sequenced gene) including the gene for lycopene cyclase (Figure 3). We have identified a candidate ORF for a lycopene cyclase based on gene clustering on the chromosome (Figure 2). In many cyanobacterial genomes the orthologues of this hypothetical gene slr0941 are adjacent to orthologues of -carotene desaturase (slr0940 in Synechocystis sp. PCC 6803), which catalyzes formation of trans-lycopene. Distant sequence homology of Slr0941 with polyketide cyclases, revealed by PSI-BLAST, indirectly supports this functional prediction. This prediction is now tested experimentally.

Predictions based on shared regulatory sites: example of a cobalt transporter

     We have started a systematic analysis of regulatory sites in the genome of Synechocystis sp. PCC 6803 based on literature data and on prediction of novel regulatory elements and regulons (see Rodianov et al. 2002). An example of functional prediction based on this approach is illustrated in Figure 4. The cobalamin (vitamin B12), thiamin and riboflavin/FMN biosynthetic genes are regulated by enzyme-encoding mRNA directly binding a vitamin molecule without the need for a protein cofactor. We have identified conserved B12-binding sites in the Genome of Synechocystis sp. PCC 6803 and other cyanobacteria (Figure 4). Based on the presence of one of these sites upstream of hypothetical hupE gene orthologues in Synechocystis sp. PCC 6803, Synechococcus WH 8102, and prochlorococcus marinus, the hupE orthologue in these organisms (slr2135 in Synechocystis sp. PCC 6803) may be involved in cobalt transport.

Experimental approaches

Example 1: Carotenoids

  • Carotenoids present in Synechocystis sp. PCC 6803: (Figure 5)
  • The enzymes involved in the myxoxanthophyll biosynthesis pathway and the lycopene cyclase reaction are still unknown in Synechocystis
  • Candidate Synechocystis ORFs involved in myxoxanthophyll biosynthesis have been identified, and interruptions have been intro duced into these ORFs.
  • Pigment composition is determined initially by HPLC. The spectral characteristics of individual HPLC peaks help in the assignment.
  • HPLC fractions are also analyzed by MALDI-TOF to obtain the mass of the corresponding pigments.
  • Figure 6 illustrates the effect of one of these interruptions on the pigment composition of the resulting mutant

Example 2: Carbon utilization and sequestration.

     Mutants have been generated that are altered in photosynthesis and carbon metabolism. Examples include mutants lacking one or more of the photosystems, one or more of the respiratory oxidases, and a mutant named “Cyanorubrum” (Synechocystis with ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) replaced by the enzyme from Rhodospirillum rubrum, a purple bacterium, with a lower CO2 vs. O2 affinity; a kind gift from Dr. Michael Gurevitz at Tel Aviv University). One of the basic questions that we address is what effects such changes have on presence and levels of metabolites etc.

  • Methodology: GC/MS (gas chromatography / mass spectrometry)
  • GC/MS: requires derivatization of carboxy- and hydroxy groups prior to application to the GC/MS instrument (to increase volatility) and limited resolution of isomers (for example, various sugar phosphates). The next challenge will be to determine metabolic fluxes through pathways:
  • What is the flux rate of materials flowing through the main pathways?
  • Methodologies involving in vivo isotope labeling are being developed and evaluated

  • Galperin MY, Koonin EV (2000) Who's your neighbor? New computational approaches for functional genomics. Nature Biotechnol 18: 609-613

  • Huynen M, Snel B, Lathe W, Bork P (2000) Exploitation of gene context. Curr Opin Struct Biol 10: 366-370

  • Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402: 83-86

  • Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res 30: 306-309

  • Osterman A, Overbeek R (2003) Missing genes in metabolic pathways: A comparative genomics approach. Curr Opin Chem Biol, 7, 238-251


Last Modified May-5-2003 © All Rights Reserved

For any queries, contact webmaster
Website developed by: Naresh Kumar Contact Developer