Written for biologists and medical researchers who don't have any special training in data analysis and statistics, "Guide to Analysis of DNA Microarray Data, Second Edition" begins where DNA array equipment leaves off: the image produced by the microarray. The text deals with the questions that arise starting at this point, providing an introduction to microarray technology, then moving on to image analysis, data analysis, cluster analysis, and beyond. With all chapters rewritten, updated, and expanded to include the latest generation of technology and methods, "Guide to Analysis of DNA Microarray Data, Second Edition" offers practitioners reliable information using concrete examples and a clear, comprehensible style. This Second Edition features entirely new chapters on: Image analysis Experiment design Automated analysis, integrated analysis, and systems biology Interpretation of results Intended for readers seeking practical applications, this text covers a broad spectrum of proven approaches in this rapidly growing technology. Additional features include further reading suggestions for each chapter, as well as a thorough review of available analysis software.
Steen Knudsen received his Ph.D. from the University of Copenhagen in 1990 and was among the pioneers of bioinformatics with a publication in Nature the same year. His postdoctoral training in bioinformatics at Harvard University concerned computational gene finding in the human genome. Since 1998 he has been working with methods for analysis of DNA microarray data. He currently heads the DNA microarray group at the Technical University of Denmark.
Preface.Acknowledgments.1. Introduction to DNA Microarray Technology. 1.1 Hybridization.1.2 Gold Rush?1.3 The Technology behind DNA Microarrays.1.3.1 Affymetrix GeneChip Technology.1.3.2 Spotted Arrays.1.3.3 Digital Micromirror Arrays.1.3.4 Inkjet Arrays.1.3.5 Bead Arrays.1.3.6 Serial Analysis of Gene Expression (SAGE).1.4 Parallel Sequencing on Microbead Arrays.1.4.1 Emerging Technologies 1.5 Example: Affymetrix vs. Spotted Arrays.1.6 Summary.1.7 Further Reading.2. Overview of Data Analysis.3. Image Analysis.3.1 Gridding.3.2 Segmentation.3.3 Intensity Extraction.3.4 Background Correction.3.5 Software.3.5.1 Free Software for Array Image Analysis.3.5.2 Commercial Software for Array Image Analysis.3.6 Summary.3.7 Further Reading.4. Basic Data Analysis.4.1 Normalization.4.1.1 One or More Genes are Assumed Expressed at Constant Rate.4.1.2 Sum of Genes is Assumed Constant.4.1.3 Subset of Genes is Assumed Constant.4.1.4 Majority of Genes Assumed Constant.4.1.5 Spike Controls.4.2 Dye Bias, Spatial Bias, Print Tip Bias.4.3 Expression Indices.4.3.1 Average Difference.4.3.2 Signal.4.3.3 Model-Based Expression Index.4.3.4 Robust Multiarray Average.4.3.5 Position Dependent Nearest Neighbor Model.4.4 Detection of Outliers.4.5 Fold Change.4.6 Significance.4.6.1 Multiple Conditions.4.6.2 Nonparametric Tests.4.6.3 Correction for Multiple Testing.4.6.4 Example I: t-Test and ANOVA.4.6.5 Example II: Number of Replicates.4.7 Mixed Cell Populations.4.8 Summary.4.9 Further Reading.5. Visualization by Reduction of Dimensionality.5.1 Principal Component Analysis.5.2 Example 1: PCA on Small Data Matrix.5.3 Example 2: PCA on Real Data.5.4 Summary.5.5 Further Reading.6. Cluster Analysis.6.1 Hierarchical Clustering.6.2 K-means Clustering.6.3 Self-Organizing Maps.6.4 Distance Measures.6.4.1 Example: Comparison of Distance Measures.6.5 Gene Normalization.6.6 Visualization of Clusters.6.6.1 Example: Visualization of Gene Clusters in Bladder Cancer.6.7 Summary.6.8 Further Reading.7. Beyond Cluster Analysis.7.1 Function Prediction.7.2 Discovery of Regulatory Elements in Promoter Regions.7.2.1 Example 1: Discovery of Proteasomal Element.7.2.2 Example 2: Rediscovery of Mlu Cell Cycle Box (MCB).7.3 Summary.7.4 Further Reading.8. Automated Analysis, Integrated Analysis and Systems Biology.8.1 Integrated Analysis.8.2 Systems Biology.8.3 Further Reading.9. Reverse Engineering of Regulatory Networks.9.1 The Time-Series Approach.9.2 The Steady-State Approach.9.3 Limitations of Network Modeling.9.4 Example 1: Steady-State Model.9.5 Example 2: Steady-State Model on Bacillus Data.9.6 Example 3: Linear Time-Series Model.9.7 Further Reading.10. Molecular Classifiers.10.1 Feature Selection.10.2 Validation.10.3.1 Nearest Neighbor.10.3.2 Nearest Centroid.10.3.3 Neural Networks.10.3.4 Support Vector Machine.10.4 Performance Evaluation.10.5 Example I: Classification of Bladder Cancer Subtypes.10.6 Example II: Classification of SRBCT Cancer Subtypes.10.7 Summary.10.8 Further Reading.11. The Design of Probes for Arrays.11.1 Selection of Genes for an Array.11.2 Gene Finding.11.3 Selection of Regions Within Genes.11.4 Selection of Primers for PCR.11.4.1 Example: Finding PCR Primers for Gene AF105374.11.5 Selection of Unique Oligomer Probes.11.6 Remapping of Probes.11.7 Further Reading.12. Genotyping and Resequencing Chips.12.1 Example: Neural Networks for GeneChip Prediction.12.2 Further Reading.13. Experiment Design and Interpretation of Results.13.1 Factorial Designs.13.2 Designs for Two-Channel Arrays.13.3 Hypothesis Driven Experiments.13.4 Independent Verification.13.5 Interpretation of Results.13.6 Limitations of Expression Analysis.13.6.1 Relative Versus Absolute RNA Quantification.13.7 Further Reading.14. Software Issues and Data Formats.14.1 Standardization Efforts.14.2 Databases.14.3 Standard File Format.14.4 Software for Clustering.14.4.1 Example: Clustering with ClustArray.14.5 Software for Statistical Analysis.14.5.1 Example: Statistical Analysis with R.14.5.2 The Affy Package of Bioconductor.14.5.3 Commercial Statistics Packages.14.6 Summary.14.7 Further Reading.Appendix A: Web Resources: Commercial Software Packages.References.Index.