Groups Coordinate Gene Sequencing

When several research teams around the worldA protocol that takes advantage of the unusual
announced plans in the fall of 1996 for full-lengthnucleotide "cap" on the 5' end of mRNAs requires
cDNA (gene) sequencing, investigators felt that thethat the first cDNA strand's extension be long
highly beneficial infrastructure provided since 1994 byenough to protect the cap as a contingency for final
the international Integrated Molecular Analysis ofcDNA clone production. Soares reported, however,
Genome Expression (I.M.A.G.E.) consortium [HGN 6(6),that about one-third of cDNA transcripts begin within
3] should be extended to the challenges of completethe mRNA, as contrasted with preferred starts at
cDNA sequencing. A subsequent workshop forthe mRNA's 3' end, thus giving rise to3' truncations.
I.M.A.G.E. participants was held in May 1997 inThis problem can be alleviated substantially by size
Gaithersburg, Maryland. The meeting was organizedfractionating the mRNAs and later selecting out the
and chaired by Greg Lennon [then at LawrencecDNA products with lengths equal to the size-sorted
Livermore National Laboratory (LLNL) and now atmRNA templates. Hans Lehrach (Max Planck Institut
Gene Logic Inc.] with Marvin Stodolsky coordinatingfür Molekulare Genetik, Germany) related the
for the meeting sponsor, the DOE Office of Biologicalvalue of massively parallel oligomer fingerprinting of
and Environmental Research. Scientists attended fromcDNAs. This is an economical way to screen a library
France, Germany, Italy, Japan, Sweden, the Unitedfor novel and longer, potentially full-length cDNAs.
Kingdom, and the United States.Optimal candidate cDNAs chosen by the Lehrach
Several workshop participants are members of theteam at the Resource Center of the German
subgroup EURO-IMAGE, whose goals includeGenome Project are being sequenced in the
generating and sequencing a master set of uniquelaboratory of Annemarie Poustka (Deutsches
full-length cDNA clones (based on I.M.A.G.E.Krebsforschungszentrum).
consortium resources) representing 3000 transcriptsMore than one sequencing read commonly is
and 6 Mb of finished sequence. Other EURO-IMAGEnecessary to display the complete sequence for
goals are to obtain high-resolution and comparativecDNAs longer than a few hundred bases. Strategies
functional mapping in human and model organisms offor economical full-length sequencing were discussed
1000 master-set genes and to develop the I.M.A.G.E.by Lennon and Richard Gibbs (Baylor College of
consortium database for easy access to anMedicine). Sequence reads beyond 1000 bases now
integrated view of the sequence, map, andare being obtained with improvements to sequencing
expression data generated.systems by Wilhelm Ansorge's team at the European
U.S. funding agencies represented at the workshopMolecular Biology Laboratory. Ansorge suggested
included DOE, NIH, and the recently establishedthat, for cDNAs shorter than 2 kb, good coverage
nonprofit Merck Genome Research Institute [HGNcould be achieved by two overlapping reads on
8(3-4), 9]. Selected highlights follow of technicalcomplementary strands.
progress in complete cDNA sequencing, as reportedGiuseppe Borsani (Telethon Institute of Genetics and
at the workshop.Medicine) reported on the benefits of the easily
Highlights of Technical Progressmanipulated Drosophila model for studies of
Attendees addressed a wide range of topics,development and function to reveal roles
including the status of cDNA sequencing projects,represented by human cDNAs.
future targets, data- and clone-release policies, qualityMark Boguski (National Center for Biotechnology
criteria and assessment, and mouse and other modelInformation) discussed the status of the dbEST
organism cDNAs. Speakers projected that, withcDNA sequence database and made
adequate support from funding agencies, participatingrecommendations for the evolution needed to meet
laboratories could generate up to 15,000 full-lengththe impending new demands of complete DNA
cDNA sequences in the following year. With averagesequencing. He observed that each group will have its
cDNA lengths of 2 kb, this represents some 30 Mbown selection criteria and sequencing priorities, such
of total sequence.as finding cancer genes, genes with Drosophila
Researchers have long recognized that expression ofhomologs, or genes that already have been mapped.
a single gene may culminate in the production ofBoguski coined the expression "the slicing problem" to
several different messenger RNA (mRNA) transcripts,describe the difficulties in avoiding undesirable
depending both on the gene and the source tissue.duplication and redundancy due to overlapping choice
Added to this biological complexity are the technicalcategories. A possible solution would be to establish a
challenges of converting fragile mRNAs to theregistration and tracking database modeled after the
sturdier cDNAs. Standard methods involve use ofsuccessful European Bioinformatics Institute's (EBI)
poly dT as a primer on the 3' poly A end of purifiedRHAlloc-RHdb approach used in constructing the
mRNAs, with reverse transcriptase enzymes of viralhuman transcript map. Patricia Rodriguez-Tomé
origin polymerizing the synthesis of a single-stranded(EBI) has accepted this responsibility. This data will
DNA complement of the mRNA. These initial DNAinclude an investigator or center name and contact
transcripts often fail to extend to the 5' end ofinformation, identifiers for the physical cDNA clones
longer mRNAs. With the use of more routinebeing sequenced and associated EST accession
biochemistries, the single-stranded DNA is convertednumbers, and sequencing status. When participants
into duplex DNA and combined with a DNA vector toregistered a clone that they intended to sequence,
support its propagation and maintenance as a DNAthe database would detect and report overlaps with
clone. The double-stranded DNAs produced are muchclones selected by other groups.
more stable and less susceptible to degradativeAttendees agreed that the I.M.A.G.E. consortium
processes than their single-stranded mRNAshould convene every 6 months to maintain
predecessors. However, because the initial reversenecessary coordination and efficiency. A subsequent
transcription is often shortened, cDNA libraries withmeeting, organized by Quackenbush, was held in
abundant truncated products are the common result,September 1997 in conjunction with the Ninth
particularly for the longer source mRNAs. StrategiesInternational Genome Sequencing and Analysis
devised for alleviating this truncation problem wereConference in Hilton Head, South Carolina. Washington
described by Takao Isogai (Helix Research Institute,University scientists will organize the next meeting,
Japan), Nobuo Nomura (Kazusa DNA Researchtentatively planned to concur with the May 1998
Institute, Japan), John Quackenbush [The InstituteHuman Genome Workshop at Cold Spring Harbor
for Genomic Research (TIGR)], and M. Bento SoaresLaboratory.
(University of Iowa).