Grad student pieces together gene map
by MIKE MARTIN, UPI Science Correspondent
SANTA CRUZ, Calif. August 13 (UPI) -- Without the Herculean effort of a graduate student, the human genome map may still not be assembled.
James Kent, a student at the University of California, Santa Cruz, wrote a computer program to assemble the massive collection of DNA pieces that created the first public draft of the human genome, the final step necessary for the Human Genome Project to declare completion on June 26, 2000.
"The pieces of the map came from the International Human Genome Sequencing
Program as well as many other labs," Kent told United Press International.
"We used the computer program to assemble them into a nearly complete whole."
Kent and colleague David Haussler describe the creation of that computer program
as a "surprisingly simple" approach to "the most important puzzle-solving
exercise in recent history" in the August issue of the journal Genome Research.
The computer program, called GigAssembler, had to trim and assemble the nearly
400,000 pieces of human DNA generated by the Human Genome Project over a decade.
"You have 10 to 20 pieces of DNA on each gene, and you have one thousand
to two thousand genes per chromosome," Kent told UPI. "There are 23
chromosome pairs in the human genome."
GigAssembler used a so-called "greedy" algorithm that assembles sequence
pieces according to best fit first, Kent explained.
"It turns out that in order for a computer to solve how to best pack luggage
in your car trunk, the computer actually has to try every possible packing method,"
Kent said. "That's clearly impossible with so many DNA pieces, so we designed
the algorithm to pick out the best way to piece them together without having
to try every possible combination."
GigAssembler can consult a wide variety of information to determine how pieces
fit. For example, if two sequence segments code parts of the same gene, GigAssembler
scores a fit. Using the algorithm, GigAssembler successfully assembled the first
public genome draft containing 2.7 billion base pairs -- 88% of the genome.
Since then, GigAssembler has performed further assemblies incorporating up to
92% of the human genome.
"The human genome map, in its assembled form, provides signposts, or markers
that guide researchers in identifying the genetic basis of many diseases,"
Fiona Crawford, associate director of the Roskamp Institute for Research of
Neuropsychiatric Disorders, told United Press International from the University
of South Florida in Tampa. "Reading the assembled map is very much like
being able to tell whether you are between New York and Tampa or New York and
Boston. It allows you to narrow your search area considerably."
Crawford, whose research includes molecular genetic analysis of Alzheimer's
disease, told UPI that had the human genome map been available 20 years ago,
it would have vastly accelerated Alzheimer's research.
"Early onset Alzheimer's is almost entirely genetic," Crawford said.
"Having the assembled genome map would have greatly sped us on our way
to identifying the amyloid protein gene responsible for the disease"
The human genome has been mapped by two competing - and different -- computer
programs, Kent told UPI: the so-called "Solara assembler," used by
J. Craig Ventner's privately held Solara Genomic Research, and GigAssembler.
"The Ventner map has more holes, but each hole is smaller," Kent said.
"Our map has fewer holes, but each hole is larger. Also - our map is free,
and theirs costs money."