Exploratory data analysis
The initial chapters of this e-book describe statistical methods to explore and
summarise a data set. Appropriate methods depend on the structure of the data set — the number
and types of its variables.
- One numerical variable
- Chapter 2 describes graphical and numerical summaries of a single numerical
variable.
- Two or more numerical variables
- Chapter 3 examines the relationship between two or more numerical variables.
- One time-ordered numerical variable
- In Chapter 4, we examine statistical methods for ordered data (time series).
- Categorical variables
- In Chapter 5, graphical and numerical methods are presented for categorical
variables.
- Many variables
- Chapter 6 is a brief chapter containing a few methods for data sets with
3 or more variables.
Data collection
Statisticians should be involved before any data are collected.
Statistical principles can be applied to the data collection process that
ensure that the resulting data can be meaningfully analysed. Chapters 7 and
8 explore the idea of random sampling and describe some principles that should be followed in data collection.
Inference
To fully understand the information that is contained in most data sets, we must take account of randomness — if we collected the data again by repeating an experiment or collecting data from different people, the values would often be different. The relevant statistical methods are collectively called inference.
Again, the details of the statistical analysis depend mostly on the structure of the data set — the number and types of its variables.
- One variable
- Chapters 9 and 10 develop the methodology to answer questions about a
single numerical or categorical variable (confidence intervals and hypothesis
tests).
- Two or more groups
- Chapter 11 present confidence intervals and hypothesis tests to compare
two or more groups.
- Two numerical variables
- Chapter 12 models the relationship between two numerical variables and gives
confidence intervals and hypothesis tests about these models.
- Two categorical variables
- Chapter 13 describes methodology to compare groups of categorical data
or to examine the relationship between two categorical variables (contingency
tables).