Studying the human genome – the complete set of human genes – is a way of studying fundamental details about ourselves. The three billion letters of the human genome are written using the four-letter alphabet of DNA. The DNA is divided among 23 pairs of chromosomes that are found in each of the trillions of cells in our bodies. In 2003, The Human Genome Project produced a complete representative sequence of the human genome. Of course, people are not identical, and DNA sequences do differ subtly between individuals. Currently, a number of separate projects are charting sequence variations found in human populations.
The representative sequence is a composite from several people who donated blood samples. Originally, close to 100 people volunteered to give a sample of their blood. Each person provided their informed consent, affirming that they agreed to the study of their DNA. No names were attached to the blood samples and ultimately scientists used only a few of them. These measures ensured that the DNA sequences remained anonymous; not even the donors knew whether their samples were actually used or not.
The main goal of The Human Genome Project was to read, letter by letter, the three billion bases of human DNA. Before starting to sequence the human genome, scientists built maps of the chromosomes and developed and refined techniques for analyzing DNA. With the tools in place, project scientists began large-scale DNA sequencing in 1999. In just one year, they had amassed sequence data covering more than 80 percent of the genome.
The human genome is a massive text. If the three billion letters (or bases) of the genome were printed in telephone books, they would require a stack of books nearly as tall as the Washington monument.
To accurately determine the sequence of every base in the genome, scientists needed to read the three billion bases not just once, but at least six to ten times. Individual sequencing reactions could only reveal the order of a few hundred bases of DNA at a time – amounting to a fraction of a page. This meant that to place in order all of the DNA bases, it was necessary to produce many thousands of overlapping segments of DNA sequence.