Computational and Statistical Methods for Protein Quantification by Mass Spectrometry

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry

By: Ingvar Eidhammer (author), Harald Barsnes (author), Lennart Martens (author), Geir Egil Eide (author)Hardback

1 - 2 weeks availability

Description

The definitive introduction to data analysis in quantitative proteomics This book provides all the necessary knowledge about mass spectrometry based proteomics methods and computational and statistical approaches to pursue the planning, design and analysis of quantitative proteomics experiments. The author s carefully constructed approach allows readers to easily make the transition into the field of quantitative proteomics. Through detailed descriptions of wet-lab methods, computational approaches and statistical tools, this book covers the full scope of a quantitative experiment, allowing readers to acquire new knowledge as well as acting as a useful reference work for more advanced readers. Computational and Statistical Methods for Protein Quantification by Mass Spectrometry: * Introduces the use of mass spectrometry in protein quantification and how the bioinformatics challenges in this field can be solved using statistical methods and various software programs. * Is illustrated by a large number of figures and examples as well as numerous exercises. * Provides both clear and rigorous descriptions of methods and approaches. * Is thoroughly indexed and cross-referenced, combining the strengths of a text book with the utility of a reference work. * Features detailed discussions of both wet-lab approaches and statistical and computational methods. With clear and thorough descriptions of the various methods and approaches, this book is accessible to biologists, informaticians, and statisticians alike and is aimed at readers across the academic spectrum, from advanced undergraduate students to post doctorates entering the field.

Create a review

Contents

Preface xv Terminology xvii Acknowledgements xix 1 Introduction 1 1.1 The composition of an organism 1 1.1.1 A simple model of an organism 1 1.1.2 Composition of cells 3 1.2 Homeostasis, physiology, and pathology 4 1.3 Protein synthesis 4 1.4 Site, sample, state, and environment 4 1.5 Abundance and expression protein and proteome profiles 5 1.5.1 The protein dynamic range 6 1.6 The importance of exact specification of sites and states 6 1.6.1 Biological features 7 1.6.2 Physiological and pathological features 7 1.6.3 Input features 7 1.6.4 External features 7 1.6.5 Activity features 7 1.6.6 The cell cycle 8 1.7 Relative and absolute quantification 8 1.7.1 Relative quantification 8 1.7.2 Absolute quantification 9 1.8 In vivo and in vitro experiments 9 1.9 Goals for quantitative protein experiments 10 1.10 Exercises 10 2 Correlations of mRNA and protein abundances 12 2.1 Investigating the correlation 12 2.2 Codon bias 14 2.3 Main results from experiments 15 2.4 The ideal case for mRNA-protein comparison 16 2.5 Exploring correlation across genes 17 2.6 Exploring correlation within one gene 18 2.7 Correlation across subsets 18 2.8 Comparing mRNA and protein abundances across genes from two situations 19 2.9 Exercises 20 2.10 Bibliographic notes 21 3 Protein level quantification 22 3.1 Two-dimensional gels 22 3.1.1 Comparing results from different experiments DIGE 23 3.2 Protein arrays 23 3.2.1 Forward arrays 24 3.2.2 Reverse arrays 25 3.2.3 Detection of binding molecules 25 3.2.4 Analysis of protein array readouts 25 3.3 Western blotting 25 3.4 ELISA Enzyme-Linked Immunosorbent Assay 26 3.5 Bibliographic notes 26 4 Mass spectrometry and protein identification 27 4.1 Mass spectrometry 27 4.1.1 Peptide mass fingerprinting (PMF) 28 4.1.2 MS/MS tandem MS 29 4.1.3 Mass spectrometers 29 4.2 Isotope composition of peptides 32 4.2.1 Predicting the isotope intensity distribution 34 4.2.2 Estimating the charge 34 4.2.3 Revealing isotope patterns 34 4.3 Presenting the intensities the spectra 36 4.4 Peak intensity calculation 38 4.5 Peptide identification by MS/MS spectra 38 4.5.1 Spectral comparison 41 4.5.2 Sequential comparison 41 4.5.3 Scoring 42 4.5.4 Statistical significance 42 4.6 The protein inference problem 42 4.6.1 Determining maximal explanatory sets 44 4.6.2 Determining minimal explanatory sets 44 4.7 False discovery rate for the identifications 44 4.7.1 Constructing the decoy database 45 4.7.2 Separate or composite search 46 4.8 Exercises 46 4.9 Bibliographic notes 47 5 Protein quantification by mass spectrometry 48 5.1 Situations, protein, and peptide variants 48 5.1.1 Situation 48 5.1.2 Protein variants peptide variants 48 5.2 Replicates 49 5.3 Run experiment project 50 5.3.1 LC-MS/MS run 50 5.3.2 Quantification run 51 5.3.3 Quantification experiment 52 5.3.4 Quantification project 52 5.3.5 Planning quantification experiments 52 5.4 Comparing quantification approaches/methods 54 5.4.1 Accuracy 54 5.4.2 Precision 55 5.4.3 Repeatability and reproducibility 56 5.4.4 Dynamic range and linear dynamic range 56 5.4.5 Limit of blank LOB 56 5.4.6 Limit of detection LOD 57 5.4.7 Limit of quantification LOQ 57 5.4.8 Sensitivity 57 5.4.9 Selectivity 57 5.5 Classification of approaches for quantification using LC-MS/MS 57 5.5.1 Discovery or targeted protein quantification 58 5.5.2 Label based vs. label free quantification 59 5.5.3 Abundance determination ion current vs. peptide identification 60 5.5.4 Classification 60 5.6 The peptide (occurrence) space 60 5.7 Ion chromatograms 62 5.8 From peptides to protein abundances 62 5.8.1 Combined single abundance from single abundances 64 5.8.2 Relative abundance from single abundances 65 5.8.3 Combined relative abundance from relative abundances 66 5.9 Protein inference and protein abundance calculation 67 5.9.1 Use of the peptides in protein abundance calculation 67 5.9.2 Classifying the proteins 68 5.9.3 Can shared peptides be used for quantification? 68 5.10 Peptide tables 70 5.11 Assumptions for relative quantification 70 5.12 Analysis for differentially abundant proteins 71 5.13 Normalization of data 71 5.14 Exercises 72 5.15 Bibliographic notes 74 6 Statistical normalization 75 6.1 Some illustrative examples 75 6.2 Non-normally distributed populations 76 6.2.1 Skewed distributions 76 6.2.2 Measures of skewness 76 6.2.3 Steepness of the peak kurtosis 77 6.3 Testing for normality 78 6.3.1 Normal probability plot 79 6.3.2 Some test statistics for normality testing 81 6.4 Outliers 82 6.4.1 Test statistics for the identification of a single outlier 83 6.4.2 Testing for more than one outlier 86 6.4.3 Robust statistics for mean and standard deviation 88 6.4.4 Outliers in regression 89 6.5 Variance inequality 90 6.6 Normalization and logarithmic transformation 90 6.6.1 The logarithmic function 90 6.6.2 Choosing the base 91 6.6.3 Logarithmic normalization of peptide/protein ratios 91 6.6.4 Pitfalls of logarithmic transformations 92 6.6.5 Variance stabilization by logarithmic transformation 92 6.6.6 Logarithmic scale for presentation 93 6.7 Exercises 94 6.8 Bibliographic notes 95 7 Experimental normalization 96 7.1 Sources of variation and level of normalization 96 7.2 Spectral normalization 98 7.2.1 Scale based normalization 99 7.2.2 Rank based normalization 101 7.2.3 Combining scale based and rank based normalization 101 7.2.4 Reproducibility of the normalization methods 102 7.3 Normalization at the peptide and protein level 103 7.4 Normalizing using sum, mean, and median 104 7.5 MA-plot for normalization 104 7.5.1 Global intensity normalization 105 7.5.2 Linear regression normalization 106 7.6 Local regression normalization LOWESS 106 7.7 Quantile normalization 107 7.8 Overfitting 108 7.9 Exercises 109 7.10 Bibliographic notes 109 8 Statistical analysis 110 8.1 Use of replicates for statistical analysis 110 8.2 Using a set of proteins for statistical analysis 111 8.2.1 Z-variable 111 8.2.2 G-statistic 112 8.2.3 Fisher Irwin exact test 115 8.3 Missing values 116 8.3.1 Reasons for missing values 116 8.3.2 Handling missing values 118 8.4 Prediction and hypothesis testing 118 8.4.1 Prediction errors 119 8.4.2 Hypothesis testing 120 8.5 Statistical significance for multiple testing 121 8.5.1 False positive rate control 122 8.5.2 False discovery rate control 123 8.6 Exercises 127 8.7 Bibliographic notes 128 9 Label based quantification 129 9.1 Labeling techniques for label based quantification 129 9.2 Label requirements 130 9.3 Labels and labeling properties 130 9.3.1 Quantification level 130 9.3.2 Label incorporation 131 9.3.3 Incorporation level 131 9.3.4 Number of compared samples 132 9.3.5 Common labels 132 9.4 Experimental requirements 132 9.5 Recognizing corresponding peptide variants 133 9.5.1 Recognizing peptide variants in MS spectra 133 9.5.2 Recognizing peptide variants in MS/MS spectra 134 9.6 Reference free vs. reference based 135 9.6.1 Reference free quantification 135 9.6.2 Reference based quantification 135 9.7 Labeling considerations 136 9.8 Exercises 136 9.9 Bibliographic notes 137 10 Reporter based MS/MS quantification 138 10.1 Isobaric labels 138 10.2 iTRAQ 140 10.2.1 Fragmentation 141 10.2.2 Reporter ion intensities 143 10.2.3 iTRAQ 8-plex 144 10.3 TMT Tandem Mass Tag 145 10.4 Reporter based quantification runs 145 10.5 Identification and quantification 145 10.6 Peptide table 147 10.7 Reporter based quantification experiments 147 10.7.1 Normalization across LC-MS/MS runs use of a reference sample 147 10.7.2 Normalizing within an LC-MS/MS run 149 10.7.3 From reporter intensities to protein abundances 149 10.7.4 Finding differentially abundant proteins 150 10.7.5 Distributing the replicates on the quantification runs 151 10.7.6 Protocols 152 10.8 Exercises 152 10.9 Bibliographic notes 153 11 Fragment based MS/MS quantification 155 11.1 The label masses 155 11.2 Identification 157 11.3 Peptide and protein quantification 158 11.4 Exercises 158 11.5 Bibliographic notes 159 12 Label based quantification by MS spectra 160 12.1 Different labeling techniques 160 12.1.1 Metabolic labeling SILAC 160 12.1.2 Chemical labeling 162 12.1.3 Enzymatic labeling 18O 165 12.2 Experimental setup 166 12.3 MaxQuant as a model 167 12.3.1 HL-pairs 167 12.3.2 Reliability of HL-pairs 169 12.3.3 Reliable protein results 169 12.4 The MaxQuant procedure 169 12.4.1 Recognize HL-pairs 169 12.4.2 Estimate HL-ratios 176 12.4.3 Identify HL-pairs by database search 177 12.4.4 Infer protein data 181 12.5 Exercises 183 12.6 Bibliographic notes 184 13 Label free quantification by MS spectra 185 13.1 An ideal case two protein samples 185 13.2 The real world 186 13.2.1 Multiple samples 187 13.3 Experimental setup 187 13.4 Forms 187 13.5 The quantification process 188 13.6 Form detection 189 13.7 Pair-wise retention time correction 191 13.7.1 Determining potentially corresponding forms 191 13.7.2 Linear corrections 192 13.7.3 Nonlinear corrections 192 13.8 Approaches for form tuple detection 193 13.9 Pair-wise alignment 193 13.9.1 Distance between forms 194 13.9.2 Finding an optimal alignment 195 13.10 Using a reference run for alignment 196 13.11 Complete pair-wise alignment 197 13.12 Hierarchical progressive alignment 197 13.12.1 Measuring the similarity or the distance of two runs 198 13.12.2 Constructing static guide trees 198 13.12.3 Constructing dynamic guide trees 199 13.12.4 Aligning subalignments 199 13.12.5 SuperHirn 199 13.13 Simultaneous iterative alignment 200 13.13.1 Constructing the initial alignment in XCMS 200 13.13.2 Changing the initial alignment 201 13.14 The end result and further analysis 202 13.15 Exercises 202 13.16 Bibliographic notes 204 14 Label free quantification by MS/MS spectra 205 14.1 Abundance measurements 205 14.2 Normalization 207 14.3 Proposed methods 207 14.4 Methods for single abundance calculation 207 14.4.1 emPAI 208 14.4.2 PMSS 208 14.4.3 NSAF 209 14.4.4 SI 209 14.5 Methods for relative abundance calculation 210 14.5.1 PASC 210 14.5.2 RIBAR 210 14.5.3 xRIBAR 211 14.6 Comparing methods 212 14.6.1 An analysis by Griffin 212 14.6.2 An analysis by Colaert 213 14.7 Improving the reliability of spectral count quantification 213 14.8 Handling shared peptides 214 14.9 Statistical analysis 215 14.10 Exercises 215 14.11 Bibliographic notes 216 15 Targeted quantification Selected Reaction Monitoring 218 15.1 Selected Reaction Monitoring the concept 218 15.2 A suitable instrument 219 15.3 The LC-MS/MS run 220 15.3.1 Sensitivity and accuracy 222 15.4 Label free and label based quantification 224 15.4.1 Label free SRM based quantification 224 15.4.2 Label based SRM based quantification 225 15.5 Requirements for SRM transitions 227 15.5.1 Requirements for the peptides 227 15.5.2 Requirements for the fragment ions 228 15.6 Finding optimal transitions 229 15.7 Validating transitions 230 15.7.1 Testing linearity 230 15.7.2 Determining retention time 231 15.7.3 Limit of detection/quantification 231 15.7.4 Dealing with low abundant proteins 231 15.7.5 Checking for interference 232 15.8 Assay development 232 15.9 Exercises 233 15.10 Bibliographic notes 234 16 Absolute quantification 235 16.1 Performing absolute quantification 235 16.1.1 Linear dependency between the calculated and the real abundances 236 16.2 Label based absolute quantification 236 16.2.1 Stable isotope-labeled peptide standards 237 16.2.2 Stable isotope-labeled concatenated peptide standards 238 16.2.3 Stable isotope-labeled intact protein standards 239 16.3 Label free absolute quantification 239 16.3.1 Quantification by MS spectra 239 16.3.2 Quantification by the number of MS/MS spectra 241 16.4 Exercises 242 16.5 Bibliographic notes 242 17 Quantification of post-translational modifications 244 17.1 PTM and mass spectrometry 244 17.2 Modification degree 245 17.3 Absolute modification degree 246 17.3.1 Reversing the modification 246 17.3.2 Use of two standards 248 17.3.3 Label free modification degree analysis 249 17.4 Relative modification degree 250 17.5 Discovery based modification stoichiometry 251 17.5.1 Separate LC-MS/MS experiments for modified and unmodified peptides 251 17.5.2 Common LC-MS/MS experiment for modified and unmodified peptides 252 17.5.3 Reliable results and significant differences 252 17.6 Exercises 253 17.7 Bibliographic notes 253 18 Biomarkers 254 18.1 Evaluation of potential biomarkers 254 18.1.1 Taking disease prevalence into account 255 18.2 Evaluating threshold values for biomarkers 257 18.3 Exercises 258 18.4 Bibliographic notes 258 19 Standards and databases 259 19.1 Standard data formats for (quantitative) proteomics 259 19.1.1 Controlled vocabularies (CVs) 260 19.1.2 Benefits of using CV terms to annotate metadata 260 19.1.3 A standard for quantitative proteomics data 261 19.1.4 HUPO PSI 262 19.2 Databases for proteomics data 262 19.3 Bibliographic notes 263 20 Appendix A: Statistics 264 20.1 Samples, populations, and statistics 264 20.2 Population parameter estimation 265 20.2.1 Estimating the mean of a population 266 20.3 Hypothesis testing 267 20.3.1 Two types of errors 268 20.4 Performing the test test statistics and p-values 268 20.4.1 Parametric test statistics 269 20.4.2 Nonparametric test statistics 269 20.4.3 Confidence intervals and hypothesis testing 270 20.5 Comparing means of populations 271 20.5.1 Analyzing the mean of a single population 271 20.5.2 Comparing the means from two populations 272 20.5.3 Comparing means of paired populations 275 20.5.4 Multiple populations 275 20.5.5 Multiple testing 276 20.6 Comparing variances 276 20.6.1 Testing the variance of a single population 276 20.6.2 Testing the variances of two populations 277 20.7 Percentiles and quantiles 278 20.7.1 A straightforward method for estimating the percentiles 279 20.7.2 Quantiles 279 20.7.3 Box plots 280 20.8 Correlation 280 20.8.1 Pearson s product-moment correlation coefficient 283 20.8.2 Spearman s rank correlation coefficient 285 20.8.3 Correlation line 286 20.9 Regression analysis 287 20.9.1 Regression line 288 20.9.2 Relation between Pearson s correlation coefficient and the regression parameters 289 20.10 Types of values and variables 290 21 Appendix B: Clustering and discriminant analysis 292 21.1 Clustering 292 21.1.1 Distances and similarities 293 21.1.2 Distance measures 294 21.1.3 Similarity measures 295 21.1.4 Distances between an object and a class 295 21.1.5 Distances between two classes 296 21.1.6 Missing data 297 21.1.7 Clustering approaches 297 21.1.8 Sequential clustering 298 21.1.9 Hierarchical clustering 300 21.2 Discriminant analysis 303 21.2.1 Step-wise feature selection 304 21.2.2 Linear discriminant analysis using original features 307 21.2.3 Canonical discriminant analysis 309 21.3 Bibliographic notes 312 Bibliography 313 Index 327

Product Details

  • publication date: 04/01/2013
  • ISBN13: 9781119964001
  • Format: Hardback
  • Number Of Pages: 354
  • ID: 9781119964001
  • weight: 586
  • ISBN10: 1119964008

Delivery Information

  • Saver Delivery: Yes
  • 1st Class Delivery: Yes
  • Courier Delivery: Yes
  • Store Delivery: Yes

Prices are for internet purchases only. Prices and availability in WHSmith Stores may vary significantly

Close