This is the age of genome sequencing. We already know the complete nucleotide sequences of many prokaryotic organisms and of one eukaryotic organism (the budding yeast, S. cerevisiae). During the next few years, many more complete genome sequences will become available--including that of Homo sapiens.
In theory, the availability of complete genome sequences should permit interesting questions to be asked and answered at genome level rather than at the level of the individual gene. However, the large number of nucleotides in eukaryotic genomes (3 billion in the human genome), implies that conventional experimental procedures will need to be drastically speeded up and/or dramatically scaled down if genome-wide analyses are to become practical.
Fortunately, laboratories at Stanford University and at the nearby Affymetrix corporation have been developing procedures whereby stretches of DNA of known sequence can be attached to known (but very small) locations on glass microscope slides, with the result that very large numbers of different sequences can be arrayed and tested on a single slide. As we shall see in this lecture, these "DNA chips" (also known as high density DNA microarrays) have already proved capable of supporting genome-wide analyses of
and additional uses of DNA chip technology are likely to appear in the future.
Two different techniques have been developed for attaching DNA to small spots on glass slides.
The technology developed in the laboratories of Patrick Brown and Ron Davis at Stanford has the advantage that all details of the procedure are publicly available and can be reproduced in any laboratory (Lashkari et al. 1997; DeRisi et al. 1997). The following steps are involved:
The general principles of light-directed oligonucleotide synthesis in an array on a glass slide are illustrated in this diagram (Pease et al. 1994):
This diagram, "Light-directed synthesis of oligonucleotides," is from the paper by Pease et al. (1994) Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91:5022-5026.
One of the reasons that the budding yeast, Saccharomyces cerevisiae, is so useful to humans, is that it can convert glucose to ethanol. In fact, conversion of glucose to ethanol by "fermentation" is budding yeast's preferred method of metabolism. Budding yeast cells utilize other energy/carbon sources only after the supply of glucose is exhausted. The preferred energy/carbon source after glucose is exhausted is ethanol, which is usually available in high concentration at that point.
The shift from anaerobic fermentation of glucose to aerobic respiration of ethanol is called the "diauxic shift," and it is known to be accompanied by major changes in gene expression--not surprising since whole metabolic pathways need to be activated or deactivated.
Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686. The horizontal axis represents time in hours after start of the culture. One can see that the cell number (proportional to OD600; black line) increases with time, while the glucose concentration (red line) decreases with time. The diauxic shift occurs as the glucose is being depleted.
DeRisi, Iyer and Brown (DeRisi et al. 1997) have used the complete nucleotide sequence of the budding yeast genome to develop gene arrays containing all budding yeast ORFs. Then they hybridized the genes in the arrays to cDNA isolated from yeast cells at various times before and after the diauxic shift. Variations in the resulting hybridization signals permitted them to identify and categorize the genome-wide transcriptional response to the diauxic shift. Here are some sample data:
Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686.
The pictures above show magnified views of the same small area of the total array of genes (which contains over 6100 genes and measures 18 mm by 18 mm). Spots corresponding to certain known genes (such as ACO1) are identified. cDNA from the first time point (Growth OD 0.14; 9 hours after start of culture) was labeled with a green fluorescent dye and served as reference, while cDNA from all time points was labeled with a red fluorescent dye. For each picture shown above, a reference cDNA sample (green) was mixed with an experimental cDNA sample (red), and the mixture was hybridized to a gene array. At the initial time point, the green and red signals were equal, and all spots appear yellow. At later time points, red color indicates gene expression increased relative to the reference, while green color indicates gene expression decreased relative to the reference. Complete images for each time point can be viewed at the Brown laboratory web site, and a complete database showing changes in expression of each gene at each time point is also available at the same site.
Many genes whose function is unknown changed expression by a factor of 2 or more after glucose exhaustion. The information about the direction and magnitude of their changes in expression will aid in determining their function. Of course, many of the genes encoding enzymes relevant to glucose and ethanol metabolism showed significant changes in expression. These changes are shown in the diagram below, where red boxes identify genes whose expression is increased, and green boxes show genes whose expression is decreased.
Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686.
The broad gray arrows in the diagram show major changes in flow of metabolites after the diauxic shift. It is evident that the induction of certain key enzymes (and down-regulation of other key enzymes) after the diauxic shift promotes the conversion of ethanol into acetyl-coA (which is used for energy production in the TCA and glyoxylate cycles) and oxaloacetate, which is ultimately converted into the carbohydrate storage compounds, trehalose and glycogen. This diagram provides a view of energy metabolism that nicely complements what has been learned from classic biochemical studies of these enzymes.
Another interesting feature in the diauxic shift data is the evidence for families of coordinately regulated genes. Several such families are illustrated in this diagram:
Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686. Coordinate regulation of groups of genes in response to diauxic shift. (A) Changes in cell density (as measured by OD600) and glucose concentration with time. (B) Genes strongly induced only at the last timepoint (20.5 hours). (C) Genes with peaks in mRNA levels at 18.5 hours. (D) Genes induced at the time of the diauxic shift. There are at least 17 genes in this class; only some are shown. (E) For these genes, repression begins before the diauxic shift. (F) Ribosomal protein genes are repressed when glucose is depleted.
Inspection of the sequences upstream of these coordinately regulated genes shows that they usually contain common regulatory motifs. For example, the upstream regions of the ribosomal protein genes all contain binding sites for Rap1p, whose concentration diminishes after the diauxic shift (DeRisi et al. 1997).
Although gene arrays like those described in the previous section can be used to detect major sequence differences (such as gross deletions and insertions) between individuals and/or strains within a species, the strength of hybridization signals to gene-sized DNAs is not affected by single nucleotide differences. In contrast, hybridization to oligonucleotides is strongly affected by single-nucleotide mismatches. Thus oligonucleotide arrays are capable of detecting single nucleotide polymorphisms, while gene arrays are not.
Chee et al. (Chee et al. 1996) use what they call a "4L tiled array" of oligonucleotides to detect single-nucleotide polymorphisms. They have applied their method to the study of polymorphisms in human mitochonrial DNA, whose sequence of 16, 569 bp is known.
Illustration from Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614.
This diagram shows a small section of a tiled array representing the complete human mitochondrial genome, version mt1. 15-mer oligonucleotides representing the mt1 sequence plus the 3 possible mismatches at the 7th position are arranged in columns of 4. Adjacent columns represent overlapping sequences shifted by a single nucleotide. DNA from mt1 (A and top panel of C) hybridizes primarily to the correct oligonucleotide in each set of 4. However, DNA from mt2, which contains a C rather than a T at position 16,493, hybridizes primarily to the C-mismatch-containing sequence at 16,493. In addition, hybridization to adjacent oligonucleotides is suppressed due to the single-base mismatches (compare the signal strengths under the red lines in the lower panel of C with the corresponding regions in the upper panel).
The depression of hybridization surrounding the mismatch creates a type of "footprint" in the hybridization signal. The footprint extends for the length of the oligonucleotide, which is 15 in the above case. In the diagram below, 20-mers were used and the footprint for a single mismatch (panel A) extends for 20 bp. Longer footprints, as in panels B and C, indicate multiple mismatches that create overlapping footprints. The mismatches in these closely spaced footprints cannot be unambiguously deciphered by this technology. Direct nucleotide sequencing is required. But the chip technology points out the existence of the mismatches.
Diagram from Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614.
Use of the complete human mitochondrial genome, made possible by oligonucleotide array technology, now permits far more thorough analyses of human mitochondrial genetics than previously possible. And Chee et al. point out that, if the size of the spot in the array corresponding to a single oligonucleotide can be reduced to 1 square micrometer, then all 100,000 genes of the human genome could be displayed with 4L tiling in a single 2 cm by 2 cm array, permitting unprecedented accuracy and power in human genetic analysis. Below I shall discuss a paper that presents preliminary results on single-nucleotide mismatches in the human genome. But first I would like to discuss an extensive analysis of single-nucleotide mismatches in the yeast genome.
In their analysis of the yeast genome,Winzeler et al. employ a different method for detecting sequence polymorphisms, as illustrated below.
Illustration from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197.
In this case, the oligonucleotides employed do not give complete genome coverage, just dense coverage. At least 20 oligos, 25 nucleotides long, were designed for every ORF in the reference (S288c) yeast genome and arranged on the array in the order of their chromosome position. Below each oligo is placed a second oligo that differs from the first by a single mismatch in a central position. Thus this could be considered a 2L mostly untiled array. As diagrammed in A, some mismatches in a different strain (YJM789) might be expected to reduce hybridization to both the S288c reference oligo and to the mismatch oligo. In other cases, the YJM789 sequence may be identical to the mismatch oligo. An example of the latter is shown in the bottom part of panel B. Panel B shows portions of the overall microarray resulting from hybridization of S96 (nearly identical to S288c) in red and YJM789 in green. Regions of sequence identity produce varying intensities of yellow, while regions of mismatching are preferentially red or green.
The 3714 mismatches that proved reproducible provided a set of markers useful for analysis of genetic crosses between S96 and YJM789. The average spacing between these markers was 3.5 kb. This spacing is so close that, when the DNA from the 4 spore colonies generated from a single meiosis was analyzed by hybridization to microarrays, each of the 97 crossovers that occurred in this meiosis could be mapped with high resolution. Here is an example for the 900 kb of chromosome XIII:
Diagram from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197.
Notice that, for each of the 4 spore colonies, sequences hybridizing to oligos in the fashion of strain S96 (greenish) occur in runs, as do sequences hybridizing in the fashion of YJM789 (reddish), allowing the locations of crossovers to be deduced (white lines in the background). Notice that the information about crossover positions is far more complete--permitting far higher resolution--than would have been possible with the classic genetic techniques, in which at most a few tens of markers would be analyzed simultaneously.
The high density of markers makes it possible to identify genes responsible for mysterious phenotypes. The example I'll describe here, from the work of Winzeler et al. (Winzeler et al. 1998), comes from yeast, but similar techniques will be used in the future to help localize genes responsible for cancer phenotypes in humans.
From a cross between S96 (MATa LYS2 lys5 ho) and YJM789 (MATalpha lys2 LYS5 ho:hisG cyh), 17 MATalpha lys2 LYS5 ho cyh segregants were selected. DNAs from 10 of these were hybridized to microarrays and analyzed. Note that all the markers corresponded to known genes except cyh (cycloheximide sensitivity). The summary diagram below shows that there were only 5 regions where all 10 analyzed segregants were homogeneous. Four of these corresponded to the known locations of MATalpha, lys2, LYS5, and ho. The fifth correlated with the known location of the PDR5 gene, which encodes a multidrug resistance pump.
Diagram from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197. Information for 5 yeast chromosomes is shown here. Information for the other chromosomes can be obtained from www.sciencemag.org/feature/data/980398.shl.
Note that, in the above diagram, the vertical ticks indicate the positions of polymorphism probes. These probes are colored reddish if DNA from the segregant hybridizes to them like YJM789 and greenish if it hybridizes like S96. The pink and green horizontal stripes show the authors' guesses as to the origin of the DNA in the indicated region (pink from YJM789; green from S96).
Clearly a similarly dense array of sequence polymorphisms would be of great assistance in assigning mysterious human phenotypes, such as cancer, to specific genes. The human situation is complicated by the large size of the genome and by the fact that human cells are usually diploid. A recent publication by Wang et al. (Wang et al. 1998) shows that these complications can be overcome. These investigators used a 4L tiled array approach, and they applied it to both strands. The approach was applied to nucleotide sequence information from 16,725 human "sequence tagged sites" (sites shown by earlier investigators of the human genome to be uniquely amplifiable from specific PCR primers) covering about 2 Mb of sequence. Oligonucleotide arrays providing 4L tiling for both strands were constructed according to reference DNA sequences for each STS. Then the corresponding STSs were amplified from the DNAs of seven individuals and hybridized to the arrays. Computers were used to scan the results and determine, for each position, whether the individual was homozygous for the reference sequence, heterozygous, or homozygous for a different sequence. In this way, nearly 3000 single-nucleotide polymorphisms (SNPs) were identified, corresponding to one for every 721 bp of sequence surveyed. These SNPs provide a fairly good set of markers. Their positions with respect to other markers on human chromosomes can be viewed at the Whitehead Institute web site. They will be useful in future mapping projects, especially considering that they can be analyzed quickly and simultaneously.
To make these SNPs even more useful, Wang et al. (Wang et al. 1998) are in the process of developing "genotyping arrays" for each SNP. These arrays consist of short 4L tiled arrays for each of the two alternative sequences anticipated at each SNP. Comparison of hybridization patterns in the example below permits easy distinction between A/A homozygotes, A/C heterozygotes, and C/C homozygotes:
Illustration from Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, Lander ES (1998) Large-scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082.
The next goal is to search the protein coding regions of all human genes (only 40 times more sequence than analyzed in the present study) for SNPs. The resulting map should be extremely useful in mapping human genetic traits and disease susceptibilities.
Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614
DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686
Fodor SPA, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767-773
Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, Brown PO, Davis RW (1997) Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94:13057-13062
Lashkari DA, Hunicke-Smith SP, Norgren RM, Davis RW, Brennan T (1995) An automated multiplex oligonucleotide synthesizer: development of high-throughput, low-cost DNA synthesis. Proc Natl Acac Sci USA 92:7912-7915
Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SPA (1994) Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci USA 91:5022-5026
Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, Lander ES (1998) Large-scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082
Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysisi of the mitotic cell cycle. Molecular Cell 2:65-73
Wodicka L, Dong H, Mittmann M, Ho M-H, Lockhart DJ (1997) Genome-wide expression monitoring in Saccharomyces cerevisiae. Nature Biotech 15:1359-1367
Go to the Molecular and Genetics Methods home page
Go to the lecture of September 3, 1998