Grad student pieces together gene map

by MIKE MARTIN, UPI Science Correspondent

SANTA CRUZ, Calif. August 13 (UPI) -- Without the Herculean effort of a graduate student, the human genome map may still not be assembled.

James Kent, a student at the University of California, Santa Cruz, wrote a computer program to assemble the massive collection of DNA pieces that created the first public draft of the human genome, the final step necessary for the Human Genome Project to declare completion on June 26, 2000.

"The pieces of the map came from the International Human Genome Sequencing Program as well as many other labs," Kent told United Press International. "We used the computer program to assemble them into a nearly complete whole."

Kent and colleague David Haussler describe the creation of that computer program as a "surprisingly simple" approach to "the most important puzzle-solving exercise in recent history" in the August issue of the journal Genome Research. The computer program, called GigAssembler, had to trim and assemble the nearly 400,000 pieces of human DNA generated by the Human Genome Project over a decade.

"You have 10 to 20 pieces of DNA on each gene, and you have one thousand to two thousand genes per chromosome," Kent told UPI. "There are 23 chromosome pairs in the human genome."
GigAssembler used a so-called "greedy" algorithm that assembles sequence pieces according to best fit first, Kent explained.

"It turns out that in order for a computer to solve how to best pack luggage in your car trunk, the computer actually has to try every possible packing method," Kent said. "That's clearly impossible with so many DNA pieces, so we designed the algorithm to pick out the best way to piece them together without having to try every possible combination."

GigAssembler can consult a wide variety of information to determine how pieces fit. For example, if two sequence segments code parts of the same gene, GigAssembler scores a fit. Using the algorithm, GigAssembler successfully assembled the first public genome draft containing 2.7 billion base pairs -- 88% of the genome. Since then, GigAssembler has performed further assemblies incorporating up to 92% of the human genome.

"The human genome map, in its assembled form, provides signposts, or markers that guide researchers in identifying the genetic basis of many diseases," Fiona Crawford, associate director of the Roskamp Institute for Research of Neuropsychiatric Disorders, told United Press International from the University of South Florida in Tampa. "Reading the assembled map is very much like being able to tell whether you are between New York and Tampa or New York and Boston. It allows you to narrow your search area considerably."

Crawford, whose research includes molecular genetic analysis of Alzheimer's disease, told UPI that had the human genome map been available 20 years ago, it would have vastly accelerated Alzheimer's research.

"Early onset Alzheimer's is almost entirely genetic," Crawford said. "Having the assembled genome map would have greatly sped us on our way to identifying the amyloid protein gene responsible for the disease"

The human genome has been mapped by two competing - and different -- computer programs, Kent told UPI: the so-called "Solara assembler," used by J. Craig Ventner's privately held Solara Genomic Research, and GigAssembler.

"The Ventner map has more holes, but each hole is smaller," Kent said. "Our map has fewer holes, but each hole is larger. Also - our map is free, and theirs costs money."