The focus of statistics is to answer questions that are expressed in the language of some application area. Statistical methods for analysis of data are a core part of statistics, but the context of the data is most important.
Statistical analysis is a process that involves identifying the questions of interest, data collection and analysis and producing a report. In real-life problems, the data collection and analysis steps may be repeated more than once.
In many applications, the cycle of data collection and analysis is a central part of the quest for improvement to systems and processes.
Most data sets contain one or more measurements from each of a collection of 'individuals' (also called 'cases' or 'units').
Variables are classified into numerical and categorical variables. A finer classification is also sketched.
A categorical variable can be used to split the 'individuals' into groups. Equivalently, grouped data can be represented in a data matrix with a categorical variable.
Sometimes a ratio or difference of two variables in a data matrix is easier to interpret than the original variables.
In some data matrices, the rows are time-ordered.
Sometimes information is available at both group and individual level -- multi-level data. These data are most naturally stored in two data matrices.
Statistical analysis is specific to the structure of the data (i.e. the types of variable in the data matrix). CAST starts with descriptive methods to explore data; it then moves on to inferential methods that take account of randomness in the data.
In many situations, information (signal) can be obscured by random variation (noise).
When data are collected from 'individuals', they often vary considerably.
Intentional differences to experimental conditions may also cause systematic differences in variables. Natural variability makes it harder to interpret experimental results.
The natural variability of individuals also makes it harder to interpret information from surveys.
Some variation in a variable can be explained in terms of other recorded variables. Other variation is a result of natural variability in the individuals.
Variation in a data set can help us to predict the values that might occur if further data of the same kind are collected in the future.