Notes from the September 8, 1998, lecture:

DNA microarrays for global genome analysis


Note: This lecture contains references to the current literature. If you click on a reference citation, you will be taken to the bibliography (at the end of the lecture). If you click on any of the articles in the bibliography, you will be taken to the PubMed site for that article, where you willl find its abstract. In many cases, you will also find a link to the publisher's web site from which you may be able to download the complete text and figures of the article.

DNA Chips

This is the age of genome sequencing. We already know the complete nucleotide sequences of many prokaryotic organisms and of one eukaryotic organism (the budding yeast, S. cerevisiae). During the next few years, many more complete genome sequences will become available--including that of Homo sapiens.

In theory, the availability of complete genome sequences should permit interesting questions to be asked and answered at genome level rather than at the level of the individual gene. However, the large number of nucleotides in eukaryotic genomes (3 billion in the human genome), implies that conventional experimental procedures will need to be drastically speeded up and/or dramatically scaled down if genome-wide analyses are to become practical.

Fortunately, laboratories at Stanford University and at the nearby Affymetrix corporation have been developing procedures whereby stretches of DNA of known sequence can be attached to known (but very small) locations on glass microscope slides, with the result that very large numbers of different sequences can be arrayed and tested on a single slide. As we shall see in this lecture, these "DNA chips" (also known as high density DNA microarrays) have already proved capable of supporting genome-wide analyses of

and additional uses of DNA chip technology are likely to appear in the future.

 

Attaching DNA to glass: Two technologies

Two different techniques have been developed for attaching DNA to small spots on glass slides.

 

How arrays of genes can be attached to glass slides

The technology developed in the laboratories of Patrick Brown and Ron Davis at Stanford has the advantage that all details of the procedure are publicly available and can be reproduced in any laboratory (Lashkari et al. 1997; DeRisi et al. 1997). The following steps are involved:

  1. Design primer pairs for PCR amplification of desired DNA sequences. In the case where one wants to amplify all of the open reading frames (ORFs) in a genome, computer programs are available that identify the ORFs and design the primer pairs (Lashkari et al. 1997). In some cases, the desired primer pairs may be available from commercial sources (DeRisi et al. 1997).
  2. Synthesize primers. Use of an automated, multiplex oligonucleotide synthesizer (Lashkari et al. 1995) can increase speed and reduce costs.
  3. PCR.
  4. Attach the PCR products to designated locations within a microarray on a glass slide. Poly-L-lysine (positively charged) is first adsorbed to the slide, then the DNA samples are positioned with a robotic device (which you can build for only $23,000 and the designs from the Brown laboratory web site). Finally, the DNA samples are covalently linked to the slide with UV irradiation, the remaining poly-L-lysine adhesive is inactivated by acylation, the DNA on the slides is denatured by heating, and the slides are dehydrated in ethanol, dried, and stored at room temperature until use (DeRisi et al. 1997).

 

How arrays of oligonucleotides can be attached to glass slides

The general principles of light-directed oligonucleotide synthesis in an array on a glass slide are illustrated in this diagram (Pease et al. 1994):

This diagram, "Light-directed synthesis of oligonucleotides," is from the paper by Pease et al. (1994) Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91:5022-5026.

  1. Attach protected reactive groups to glass slide. In the example shown here, the reactive groups are hydroxyls (O) protected by X.
  2. Irradiate the slide through a photolithographic mask (M1). The protective groups (X) are removed from the hydroxyls in the irradiated regions.
  3. Attach protected, but specific, nucleosides (in this case, T-X) to the hydroxyl groups.
  4. Irradiate through a second mask (M2).
  5. Attach the next specific, protected nucleosides.
  6. Repeat with appropriate masks and nucleosides to build up the desired oligonucleotide sequences at the desired locations.

Example of the use of gene arrays to monitor genome-wide changes in gene expression

One of the reasons that the budding yeast, Saccharomyces cerevisiae, is so useful to humans, is that it can convert glucose to ethanol. In fact, conversion of glucose to ethanol by "fermentation" is budding yeast's preferred method of metabolism. Budding yeast cells utilize other energy/carbon sources only after the supply of glucose is exhausted. The preferred energy/carbon source after glucose is exhausted is ethanol, which is usually available in high concentration at that point.

The shift from anaerobic fermentation of glucose to aerobic respiration of ethanol is called the "diauxic shift," and it is known to be accompanied by major changes in gene expression--not surprising since whole metabolic pathways need to be activated or deactivated.

Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686. The horizontal axis represents time in hours after start of the culture. One can see that the cell number (proportional to OD600; black line) increases with time, while the glucose concentration (red line) decreases with time. The diauxic shift occurs as the glucose is being depleted.

DeRisi, Iyer and Brown (DeRisi et al. 1997) have used the complete nucleotide sequence of the budding yeast genome to develop gene arrays containing all budding yeast ORFs. Then they hybridized the genes in the arrays to cDNA isolated from yeast cells at various times before and after the diauxic shift. Variations in the resulting hybridization signals permitted them to identify and categorize the genome-wide transcriptional response to the diauxic shift. Here are some sample data:

Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686.

The pictures above show magnified views of the same small area of the total array of genes (which contains over 6100 genes and measures 18 mm by 18 mm). Spots corresponding to certain known genes (such as ACO1) are identified. cDNA from the first time point (Growth OD 0.14; 9 hours after start of culture) was labeled with a green fluorescent dye and served as reference, while cDNA from all time points was labeled with a red fluorescent dye. For each picture shown above, a reference cDNA sample (green) was mixed with an experimental cDNA sample (red), and the mixture was hybridized to a gene array. At the initial time point, the green and red signals were equal, and all spots appear yellow. At later time points, red color indicates gene expression increased relative to the reference, while green color indicates gene expression decreased relative to the reference. Complete images for each time point can be viewed at the Brown laboratory web site, and a complete database showing changes in expression of each gene at each time point is also available at the same site.

Many genes whose function is unknown changed expression by a factor of 2 or more after glucose exhaustion. The information about the direction and magnitude of their changes in expression will aid in determining their function. Of course, many of the genes encoding enzymes relevant to glucose and ethanol metabolism showed significant changes in expression. These changes are shown in the diagram below, where red boxes identify genes whose expression is increased, and green boxes show genes whose expression is decreased.

Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686.

The broad gray arrows in the diagram show major changes in flow of metabolites after the diauxic shift. It is evident that the induction of certain key enzymes (and down-regulation of other key enzymes) after the diauxic shift promotes the conversion of ethanol into acetyl-coA (which is used for energy production in the TCA and glyoxylate cycles) and oxaloacetate, which is ultimately converted into the carbohydrate storage compounds, trehalose and glycogen. This diagram provides a view of energy metabolism that nicely complements what has been learned from classic biochemical studies of these enzymes.

Another interesting feature in the diauxic shift data is the evidence for families of coordinately regulated genes. Several such families are illustrated in this diagram:

Diagram from DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686. Coordinate regulation of groups of genes in response to diauxic shift. (A) Changes in cell density (as measured by OD600) and glucose concentration with time. (B) Genes strongly induced only at the last timepoint (20.5 hours). (C) Genes with peaks in mRNA levels at 18.5 hours. (D) Genes induced at the time of the diauxic shift. There are at least 17 genes in this class; only some are shown. (E) For these genes, repression begins before the diauxic shift. (F) Ribosomal protein genes are repressed when glucose is depleted.

Inspection of the sequences upstream of these coordinately regulated genes shows that they usually contain common regulatory motifs. For example, the upstream regions of the ribosomal protein genes all contain binding sites for Rap1p, whose concentration diminishes after the diauxic shift (DeRisi et al. 1997).

Use of oligonucleotide arrays to detect and utilize sequence polymorphisms

Although gene arrays like those described in the previous section can be used to detect major sequence differences (such as gross deletions and insertions) between individuals and/or strains within a species, the strength of hybridization signals to gene-sized DNAs is not affected by single nucleotide differences. In contrast, hybridization to oligonucleotides is strongly affected by single-nucleotide mismatches. Thus oligonucleotide arrays are capable of detecting single nucleotide polymorphisms, while gene arrays are not.

Chee et al. (Chee et al. 1996) use what they call a "4L tiled array" of oligonucleotides to detect single-nucleotide polymorphisms. They have applied their method to the study of polymorphisms in human mitochonrial DNA, whose sequence of 16, 569 bp is known.

Illustration from Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614.

This diagram shows a small section of a tiled array representing the complete human mitochondrial genome, version mt1. 15-mer oligonucleotides representing the mt1 sequence plus the 3 possible mismatches at the 7th position are arranged in columns of 4. Adjacent columns represent overlapping sequences shifted by a single nucleotide. DNA from mt1 (A and top panel of C) hybridizes primarily to the correct oligonucleotide in each set of 4. However, DNA from mt2, which contains a C rather than a T at position 16,493, hybridizes primarily to the C-mismatch-containing sequence at 16,493. In addition, hybridization to adjacent oligonucleotides is suppressed due to the single-base mismatches (compare the signal strengths under the red lines in the lower panel of C with the corresponding regions in the upper panel).

The depression of hybridization surrounding the mismatch creates a type of "footprint" in the hybridization signal. The footprint extends for the length of the oligonucleotide, which is 15 in the above case. In the diagram below, 20-mers were used and the footprint for a single mismatch (panel A) extends for 20 bp. Longer footprints, as in panels B and C, indicate multiple mismatches that create overlapping footprints. The mismatches in these closely spaced footprints cannot be unambiguously deciphered by this technology. Direct nucleotide sequencing is required. But the chip technology points out the existence of the mismatches.

Diagram from Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614.

Use of the complete human mitochondrial genome, made possible by oligonucleotide array technology, now permits far more thorough analyses of human mitochondrial genetics than previously possible. And Chee et al. point out that, if the size of the spot in the array corresponding to a single oligonucleotide can be reduced to 1 square micrometer, then all 100,000 genes of the human genome could be displayed with 4L tiling in a single 2 cm by 2 cm array, permitting unprecedented accuracy and power in human genetic analysis. Below I shall discuss a paper that presents preliminary results on single-nucleotide mismatches in the human genome. But first I would like to discuss an extensive analysis of single-nucleotide mismatches in the yeast genome.

In their analysis of the yeast genome,Winzeler et al. employ a different method for detecting sequence polymorphisms, as illustrated below.

Illustration from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197.

In this case, the oligonucleotides employed do not give complete genome coverage, just dense coverage. At least 20 oligos, 25 nucleotides long, were designed for every ORF in the reference (S288c) yeast genome and arranged on the array in the order of their chromosome position. Below each oligo is placed a second oligo that differs from the first by a single mismatch in a central position. Thus this could be considered a 2L mostly untiled array. As diagrammed in A, some mismatches in a different strain (YJM789) might be expected to reduce hybridization to both the S288c reference oligo and to the mismatch oligo. In other cases, the YJM789 sequence may be identical to the mismatch oligo. An example of the latter is shown in the bottom part of panel B. Panel B shows portions of the overall microarray resulting from hybridization of S96 (nearly identical to S288c) in red and YJM789 in green. Regions of sequence identity produce varying intensities of yellow, while regions of mismatching are preferentially red or green.

The 3714 mismatches that proved reproducible provided a set of markers useful for analysis of genetic crosses between S96 and YJM789. The average spacing between these markers was 3.5 kb. This spacing is so close that, when the DNA from the 4 spore colonies generated from a single meiosis was analyzed by hybridization to microarrays, each of the 97 crossovers that occurred in this meiosis could be mapped with high resolution. Here is an example for the 900 kb of chromosome XIII:

Diagram from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197.

Notice that, for each of the 4 spore colonies, sequences hybridizing to oligos in the fashion of strain S96 (greenish) occur in runs, as do sequences hybridizing in the fashion of YJM789 (reddish), allowing the locations of crossovers to be deduced (white lines in the background). Notice that the information about crossover positions is far more complete--permitting far higher resolution--than would have been possible with the classic genetic techniques, in which at most a few tens of markers would be analyzed simultaneously.

The high density of markers makes it possible to identify genes responsible for mysterious phenotypes. The example I'll describe here, from the work of Winzeler et al. (Winzeler et al. 1998), comes from yeast, but similar techniques will be used in the future to help localize genes responsible for cancer phenotypes in humans.

From a cross between S96 (MATa LYS2 lys5 ho) and YJM789 (MATalpha lys2 LYS5 ho:hisG cyh), 17 MATalpha lys2 LYS5 ho cyh segregants were selected. DNAs from 10 of these were hybridized to microarrays and analyzed. Note that all the markers corresponded to known genes except cyh (cycloheximide sensitivity). The summary diagram below shows that there were only 5 regions where all 10 analyzed segregants were homogeneous. Four of these corresponded to the known locations of MATalpha, lys2, LYS5, and ho. The fifth correlated with the known location of the PDR5 gene, which encodes a multidrug resistance pump.

Diagram from Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197. Information for 5 yeast chromosomes is shown here. Information for the other chromosomes can be obtained from www.sciencemag.org/feature/data/980398.shl.

Note that, in the above diagram, the vertical ticks indicate the positions of polymorphism probes. These probes are colored reddish if DNA from the segregant hybridizes to them like YJM789 and greenish if it hybridizes like S96. The pink and green horizontal stripes show the authors' guesses as to the origin of the DNA in the indicated region (pink from YJM789; green from S96).

Clearly a similarly dense array of sequence polymorphisms would be of great assistance in assigning mysterious human phenotypes, such as cancer, to specific genes. The human situation is complicated by the large size of the genome and by the fact that human cells are usually diploid. A recent publication by Wang et al. (Wang et al. 1998) shows that these complications can be overcome. These investigators used a 4L tiled array approach, and they applied it to both strands. The approach was applied to nucleotide sequence information from 16,725 human "sequence tagged sites" (sites shown by earlier investigators of the human genome to be uniquely amplifiable from specific PCR primers) covering about 2 Mb of sequence. Oligonucleotide arrays providing 4L tiling for both strands were constructed according to reference DNA sequences for each STS. Then the corresponding STSs were amplified from the DNAs of seven individuals and hybridized to the arrays. Computers were used to scan the results and determine, for each position, whether the individual was homozygous for the reference sequence, heterozygous, or homozygous for a different sequence. In this way, nearly 3000 single-nucleotide polymorphisms (SNPs) were identified, corresponding to one for every 721 bp of sequence surveyed. These SNPs provide a fairly good set of markers. Their positions with respect to other markers on human chromosomes can be viewed at the Whitehead Institute web site. They will be useful in future mapping projects, especially considering that they can be analyzed quickly and simultaneously.

To make these SNPs even more useful, Wang et al. (Wang et al. 1998) are in the process of developing "genotyping arrays" for each SNP. These arrays consist of short 4L tiled arrays for each of the two alternative sequences anticipated at each SNP. Comparison of hybridization patterns in the example below permits easy distinction between A/A homozygotes, A/C heterozygotes, and C/C homozygotes:

Illustration from Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, Lander ES (1998) Large-scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082.

The next goal is to search the protein coding regions of all human genes (only 40 times more sequence than analyzed in the present study) for SNPs. The resulting map should be extremely useful in mapping human genetic traits and disease susceptibilities.


References

Note: If you click on any of the articles in the bibliography, you will be taken to the PubMed site for that article, where you willl find its abstract. In many cases, you will also find a link to the publisher's web site from which you may be able to download the complete text and figures of the article.

Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SPA (1996) Accessing genetic information with high-density DNA arrays. Science 274:610-614

DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic contol of gene expression on a genomic scale. Science 278:680-686

Fodor SPA, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767-773

Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, Brown PO, Davis RW (1997) Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94:13057-13062

Lashkari DA, Hunicke-Smith SP, Norgren RM, Davis RW, Brennan T (1995) An automated multiplex oligonucleotide synthesizer: development of high-throughput, low-cost DNA synthesis. Proc Natl Acac Sci USA 92:7912-7915

Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SPA (1994) Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci USA 91:5022-5026

Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, Lander ES (1998) Large-scale identification, mapping and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082

Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW (1998) Direct allelic variation scanning of the yeast genome. Science 281:1194-1197


Some additional references you may find of interest

Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysisi of the mitotic cell cycle. Molecular Cell 2:65-73

Wodicka L, Dong H, Mittmann M, Ho M-H, Lockhart DJ (1997) Genome-wide expression monitoring in Saccharomyces cerevisiae. Nature Biotech 15:1359-1367


Go to the Molecular and Genetics Methods home page

Go to the lecture of September 3, 1998