1. Home
  2. Probability Distribution Plot

Probability Distribution Plot

Select menu: Stats | Distributions | Probability Plots

To assess how well empirical data approximates a particular theoretical distribution, the sorted values (order statistics, X(i)) are plotted against the expected values of the order statistics E(i) from the given distribution. However, usually the particular parameters of the distribution are not known and these have to be estimated first to obtain the expected values.

  1. After you have imported your data, from the menu select 
    Stats | Distributions | Probability Plots.
  2. Fill in the fields as required then click Run.

You can set additional Options before running and store the results by clicking Store.

If the distribution has a cumulative density function of F(x), and the inverse of this function is G(x) (i.e. G(F(x)) = x), then the expected values of the order statistics, are approximately G((i-0.5)/n), where i = 1…n, and n is the number of values in the sample. A plot of X(i) vs E(i) is known as a Quantile-Quantile (or Q-Q) plot. The data can also be plotted on the probability scale by plotting the cumulative probabilities of the data under the assumed distribution against their expected probabilities, i.e. F(X(i)) vs (i-0.5)/n. This is known as a Probability-Probability (or P-P) plot.

A third plot called the stabilized probability (SP) plot (Michael, 1983), was introduced, which rescales the probabilities using the transformation sp = (2/pi)*arcsin(sqrt(p)), so that the variance of the plotted points are approximately equal over the range of probability values. In the SP plot the scaled values sp are plotted rather than the unscaled p values.

The following graph shows a Normal Q-Q plot with 95% simultaneous confidence bands and a 1-1 reference line.

Available data

This lists variates that are available for analysis. Double-click a name to copy it to the Data values field or type the name.

Data values

This specifies the name of the variate that will be used in the probability distribution plot.

Distribution

This provides a dropdown list of the range of continues distributions that the observed data can be plotted against.

Degrees of freedom

Some of the distributions (Chi-square, t and F) cannot have the parameters estimated by the usual distribution fitting facilities, so these fields provide the degrees to specify the parameters of these distributions.

Box Cox transform

Select this to perform a Box Cox transform on the data before plotting it. The Box Cox transform for a variate X is defined as:

Y = (X**lambda - 1)/lambda	if lambda is not equal to 0, and
Y = LOG(X) 			if lambda = 0.

The power lambda is specified in the field provided.

If X does not have a normal distribution, a value of lambda can often be found such that Y is normally distributed.

For a Normal distribution, the Estimate button will use the YTRANSFORM command to calculate the optimal value of lambda (to the nearest 0.1 between -4 and 4) to transform the X values to a Normal distribution. The optimal value of lambda should be placed in the field above, unless the server is busy with other calculations, in which case you will need to cut and paste the value of lambda from the Output window when the server has completed the calculation.

Plotting scale

The graph can be plotted on three scales:

  • Quantile – This plots a Q-Q plot of the observed data values plotted
    against their expected quantiles
  • Probability – This plots a P-P plot of the observed data values
    transformed to a probability via the cumulative distribution function of the
    theoretical distribution plotted against their expected probabilities
  • Stabilized probability – This plots the stabilized probability plot of
    Michael (1983) described above

Confidence bands

This dropdown list allows two forms of confidence intervals to be displayed in the graph.

  • Pointwise simulates distributions of the same size as the data
    from the theoretical distribution and plots the range of values at each value
    of the order statistics that contain the proportion alpha (specified as a % in
    the edit box Conf. Level) of simulated values. Thus a sample drawn from
    the assumed distribution has approximately a probability alpha of lying within
    the limits at each point. However, overall there will be a probability of less
    than alpha that a sample will completely lie within the confidence bands.
  • Simultaneous uses a statistic given by Michael, 1983, for which the
    overall probability of plotted data lying completely within the confidence bands
    is approximately the specified value of alpha, under the null hypothesis that
    the data are a random sample from the specified distribution. This form of
    confidence limits has the advantage that it is much faster to calculate and that
    probability of the data points falling outside the limits is approximately
    constant over the range of the data.
  • None specifies no confidence bands to be drawn on the graph.

Action buttons

Run Run the analysis.
Cancel Close the dialog without further changes.
Options Opens a dialog where you can specify additional options and settings for the analysis.
Defaults Reset to the default settings. Clicking the right mouse on this button produces a shortcut menu where you can choose to reset using your currently stored defaults or the Genstat default settings.
Store Opens a dialog to specify names of structures to store the results from the analysis.

Action Icons

Pin Controls whether to keep the dialog open when you click Run. When the pin is down  the dialog will remain open, otherwise when the pin is up  the dialog will close.
Restore Restore names into edit fields and default settings.
Clear Clear all fields and list boxes.
Help Open the Help topic for this dialog.

See also

Updated on January 21, 2022

Was this article helpful?