Forms summaries for a Markov model from rainfall data (J.O. Ong’ala & D.B. Baird).

### Options

`PRINT` = string tokens |
Controls printed output (`counts` , `amounts` , `probabilities` ); default `*` |

`PLOT` = string token |
What plots to display (`probabilities` ); default `prob` |

`DAY` = variate or factor |
Day as a date or a day number within the year |

`LIMITS` = scalar or variate |
Values to define the daily rainfall states; default 0.85 |

`ORDER` = scalar |
Defines the order of the Markov chain (0…5); default 1 |

`HIGHORDER` = scalar |
Whether to use a high-order Markov chain; (`no` , `yes` ); default `no` |

`INITIAL` = scalar or variate |
The amounts of rainfall prior to the first day; default `*` |

`SPREADSHEET` = string tokens |
What to save in a spreadsheet (`counts` , `amounts` , `probabilities` ); default `*` |

### Parameters

`DATA` = variates |
The daily rainfall amounts |

`WINDOW` = scalars |
Window to plot the graph; default 3 for `ORDER` =0 and 1 otherwise |

`TITLE` = texts |
The title for the plot; default uses an automatic description |

`COUNTS` = tables |
Saves the counts by Markov state and day |

`AMOUNTS` = tables |
Saves the mean rainfall by Markov wet states and day |

`PROBABILITIES` = pointers |
Saves a pointer to variates of probabilities of a wet day by class |

`CATEGORIES` = factors |
Saves the Markov class for each day |

`STATECOUNTS` = pointers |
Saves a pointer to tables of counts for each state |

`OUTFILE` = texts |
File (with extension `.gwb` , or `.xlsx` ) to save selected spreadsheet components |

### Description

`RFSUMMARY`

creates summaries from rainfall data for a Markov chain model analysis. The Markov model splits the days into different classes based on the history of the preceding days. This is to allow for different probabilities and amounts of rainfall on a day according to what happened previously: for example, in most climates, it is more likely to rain on a day following previous rain.

The daily states, order and type of Markov model are specified by the `LIMITS`

, `ORDER`

and `HIGHORDER`

options, respectively. If the `LIMITS`

option is set to a scalar or variate of length one, this defines the breakpoint between dry and wet days. A small positive value treats days with less than this amount of rainfall as dry days (these are also removed from the rainfall for wet days). If `LIMITS`

is set to a variate of length of two or more, the rainfall states are defined as the days with rainfall less than or equal to these limits, with an extra group for rainfall greater than the top limit. The `ORDER`

option specifies the number of previous days to use when forming the Markov classes.

The classes are the combination of the daily states over the history length defined by `ORDER`

. (So there will be `(NVALUES(LIMITS)+1)**(ORDER+1)`

classes.) If there are two rainfall states, these are labelled w and d for wet and dry on each day. Otherwise they are labelled by the integers from 0 upwards. When there are two states, the default `HIGHORDER=no`

gives all the unique combinations of wet and dry days over these days. Setting `HIGHORDER=yes`

collapses the states to just the number of dry days preceding a wet day. For example, with `ORDER`

=2 and `HIGHORDER=no`

, the 8 states are ddd, ddw, dwd, dww, wdd, wdw, wwd and www (where d = dry day and w = wet day); with `ORDER`

=2 and `HIGHORDER=yes`

, the 6 states are ddd, ddw, dw, wd, wdd, and ww, as dwd and dww are combined into dw and wwd and www are combined into ww. `ORDER`

must be at between 0 and 3 for `HIGHORDER=no`

and between 2 and 5 for `HIGHORDER=yes`

.

The `DAY`

option gives the dates or the day number within a year (1…366), and the `DATA`

parameter gives the amount of rainfall on these dates. The data should be sorted into chronological order with no missing days. (Missing values should be entered for any days with no observations.) The `INITIAL`

option can specify the amount of rain on the days preceding the first day in `DATA`

; this should have `ORDER`

values. If `INITIAL`

is not set, the first `ORDER`

days will not contribute to the counts and amounts.

You can save the summaries with the `COUNTS`

, `AMOUNTS`

, `PROBABILITIES`

, `CATEGORIES`

and `STATECOUNTS`

parameters:

`COUNTS`

saves a table of counts classified by day number within the year (1…366) and Markov class (e.g. dd, wd, dw and ww);

`AMOUNTS`

saves a table of the sum of rainfall amounts classified by day and Markov wet classes (e.g. wd and ww);

`PROBABILITIES`

saves a pointer to a set of variates for each wet class giving probability of a wet day vs. a dry day for the days;

`CATEGORIES`

saves a factor giving the Markov class for each date; and

`STATECOUNTS`

saves a pointer to tables for each state defined by `LIMITS`

, giving the counts by Markov class and day.

Printed output is controlled by the `PRINT`

option, with settings:

`counts`

counts by day and Markov class;

`amounts`

amounts by day and wet Markov class; and

`probabilities`

probabilities by day and wet Markov class.

The summaries can be displayed in a spreadsheet by setting the `SPREADSHEET`

option to the following settings:

`counts`

creates a sheet containing the counts for each day by the Markov classes;

`amounts`

shows the amounts of rainfall in the wet classes; and

`probabilities`

shows the probability of rainfall in the wet classes.

The spreadsheet can be saved to a file by setting the `OUTFILE`

parameter to a Genstat or Excel spreadsheet filename (`.gwb`

or `.xlsx`

).

You can set option `PLOT`

=`probabilities`

to plot the probabilities. The `TITLE`

parameter can supply a title for the graph; if this not set, a descriptive title will be created from the Markov chain options. The `WINDOW`

parameter specifies the window to use for the graph.

Options: `PRINT`

, `PLOT`

, `DAY`

, `LIMITS`

, `ORDER`

, `HIGHORDER`

, `INITIAL`

, `SPREADSHEET`

.

Parameters: `DATA`

, `WINDOW`

, `TITLE`

, `COUNTS`

, `AMOUNTS`

, `PROBABILITIES`

, `CATEGORIES`

, `STATECOUNTS`

, `OUTFILE`

.

### Method

The procedure calculates the class of each day, and then tabulates these to create summaries. If dates are provided in `DAY`

, these are converted to days in the year by the `NDAYINYEAR`

function. Note: the 29 of February (which is only present in leap years) is day 60. The 1st March is always day 61.

### Action with `RESTRICT`

The `DATA`

or `DAY`

variates can be restricted to analyse a subset of the data. If both `DATA`

and `DAY`

are restricted, the restrictions must be consistent.

### Reference

Ong’ala, J.O. (2011). Simplifying the Markov chain analysis of rainfall data using Genstat. *MSc Thesis*, Maseno University.

### See also

Procedures: `RFFAMOUNT`

, `RFFPROBABILITY`

.

Commands for: Basic and nonparametric statistics.

### Example

CAPTION 'RFSUMMARY example','41 years rainfall for Katumani, Kenya'; \ STYLE=meta,minor IMPORT [PRINT=summary] '%Data%/Rainfall Katumani 1961-2001.gsh' RFSUMMARY [PRINT=counts,amounts,probabilities; PLOT=probabilities; \ DAY=Date; ORDER=1; SPREADSHEET=counts,amounts,probabilities] Rainfall; \ COUNTS=RFCounts; AMOUNTS=RFAmounts; TITLE='Katumani rainfall 1961-2001' RFFPROBAB [PLOT=results] COUNTS=RFCounts; \ TITLE='Katumani rainfall probabilities 1961-2001' RFFAMOUNT [PLOT=results] COUNTS=RFCounts; \ AMOUNTS=RFAmounts; TITLE='Katumani rainfall amounts 1961-2001' RFSUMMARY [PRINT=*; PLOT=*; DAY=Date; ORDER=3; HIGHORDER=yes] Rainfall; \ COUNTS=RFCounts; AMOUNTS=RFAmounts RFFPROBAB [PLOT=results; SPREADSHEET=] COUNTS=RFCounts; \ TITLE='Katumani rainfall probabilities 1961-2001 (high order 3)' RFFAMOUNT [PLOT=results; SPREADSHEET=] COUNTS=RFCounts; AMOUNTS=RFAmounts; \ TITLE='Katumani rainfall amounts 1961-2001 (high order 3)'