Calculates numerical values for data structures.

### Options

`PRINT` = string token |
Printed output required (`summary` ); default `*` i.e. no printing |
---|---|

`ZDZ` = string token |
Value to be given to zero divided by zero (`missing` , `zero` ); default `miss` |

`TOLERANCE` = scalar |
If the scalar is non missing, this defines the smallest non-zero number; otherwise it accesses the default value, which is defined automatically for the computer concerned |

`SEED` = scalar |
Seed to use for any random number generation during the calculation; default 0 |

`INDEX` = scalar |
If the calculation has a list of structures before the assignment operator (`=` ), the scalar indicates the position within the list of the structure currently being evaluated |

`RESTRICTEDUNITS` = variate |
Defines a “restriction” on the vectors in the expression; if this is set the calculations on those vectors will take place only on the units listed in the variate (and any restrictions of their own will be ignored) |

### Parameter

expression |
Expression defining the calculations to be performed |
---|

### Description

The `CALCULATE`

directive allows you to perform transformations and other calculations. It has the form:

`CALCULATE `

*expression*

The *expression *specifies what calculation is to be done, and where the results are to be stored. For example, the command

`CALCULATE Area = Length * Breadth`

specifies that the structure `Area`

is to store the results of `Length`

multiplied by `Breadth`

. All the usual arithmetic operators are available:

`+` |
addition |
---|---|

`-` |
subtraction |

`*` |
multiplication |

`/` |
division |

`**` |
exponentiation (for example, `X**2` stands for `X` ^{2} ) |

`CALCULATE`

can operate on any numerical data structure and it will automatically declare the structure to hold the results if you have not declared it already. So, if `Area`

has not yet been defined and `Length`

and `Breadth`

are scalars, `Area`

will become a scalar too.

Generally the structures involved in the calculation must have the same “shape” (for example, variates must have the same length) and the operators operate element-by-element over all their values. So, if `Length`

and `Breadth`

were variates, `Area`

would become a variate each of whose units contained the product of the corresponding units of `Length`

and `Breadth`

. However, scalars and ordinary numbers can be included with calculations on any type of data structure. So

`CALCULATE Kilo = Pound / 2.2`

would be valid whatever the type of the structures `Kilo`

and `Pound`

.

If any of the values involved in a numerical expression is missing, the result will be missing too.

Genstat has operators for relational tests:

`==` or `.EQ.` |
equality of numerical values |
---|---|

`.EQS.` |
equality of textual strings |

`>=` or `.GE.` |
greater than or equal to |

`>` or `.GT.` |
greater than |

`<=` or `.LE.` |
less than or equal to |

`<` or `.LT.` |
less than |

`/=` or or `.NE.` |
not equal to |

`.NES.` |
inequality of textual strings |

`.IS.` |
identifier equivalence (to test whether a dummy contains a particular identifier) |

`.ISNT.` |
identifier non-equivalence |

`.IN.` |
inclusion: `X.IN.Vals` gives result true for each value of `X` that is equal to any one of the values of `Vals` |

`.NI.` |
non-inclusion: the opposite of `.IN.` |

These generate a result of zero if the test is false, and one if it is true. (In fact any non-zero value is taken to represent a true value.) With most of these operators, a missing value in either operand (or in both) will generate a missing result. The exceptions are `.EQ.`

and `.NE.`

(and their synonyms), and `.EQS.`

and `.NES.`

: when both operands are missing `.EQ.`

and `.EQS.`

give a true result, while `.NE.`

and `.NES.`

give a false result.

There are also logical operators that can be used to combine the results of expressions involving relational operators.

`.AND.` |
and: `a.AND.b` true if both `a` and `b` are true |
---|---|

`.EOR.` |
either or: `a.EOR.b` is true if either `a` or `b` , but not both, is true |

`.OR.` |
or: `a.OR.b` is true if either `a` or `b` is true |

`.NOT.` |
not: `.NOT.a` is true for `a` untrue |

The precedence rules of the operators are very similar (but possibly not identical) to those in computer languages like C or Fortran. The list below shows the order in which the operators are evaluated when they are used in expressions, if brackets are not used to make the order explicit:

1) `.NOT.`

Monadic `-`

2) `.IS. .ISNT. .IN. .NI. *+`

3) `**`

4) `* /`

5) `+`

Dyadic `-`

6) ` == = /= .LT. .GT. .EQ. .LE. .GE. .NE. .NES.`

7) `.AND. .OR. .EOR.`

8) `=`

(Monadic minus means the use of the minus sign in a negative number: for example, -1.) Within each class, operations are done from left to right within an expression, unless brackets are used to indicate some other order. So

`A > B+C/D*E`

is the same as

`A > ( B + ( (C/D) * E )`

Expressions can contain lists, to specify that the same calculation is to be done for several sets of structures. For example

`CALCULATE Pay1,Pay2 = Hours1,Hours2 * Rate + Bonus`

This has the same effect as the two commands

`CALCULATE Pay1 = Hours1 * Rate + Bonus`

`CALCULATE Pay2 = Hours2 * Rate + Bonus`

Notice that, if any of the lists on the right-hand side of the expression is shorter than the list on the left-hand side, the list is re-used. So the value of `Bonus`

is used for both calculations. To take a more complicated example

`CALCULATE X,Y,Z = A,B,C + 1,2`

is the same as the three calculations

`CALCULATE X = A + 1`

`CALCULATE Y = B + 2`

`CALCULATE Z = C + 1`

However, the lists on the right-hand side must not be longer than the list on the left-hand side.

When the calculation contains lists, you can set the `INDEX`

option to a scalar which will contain the index of the current calculation. For example

`CALCULATE [INDEX=i] X,Y,X = i * A,B,C`

is the same as the three calculations

`CALCULATE X = 1 * A`

`CALCULATE Y = 2 * B`

`CALCULATE Z = 3 * C`

as `X`

and `A`

are the first items of their lists, `Y`

and `B`

are the second, and `Z`

and `C`

are the third.

Genstat provides a wide range of functions for use in expressions. Many of these, known as transformations, produce a result that is the same type of structure as the *argument* of the function. For example,

`CALCULATE Logsulph = LOG(Sulphur)`

uses the `LOG`

function to take natural logarithms of the values in the data structure `Sulphur`

. If `Sulphur`

is a variate `Logsulph`

will also be a variate with the same number of values.

Scalar functions produce a scalar summary of all the values in a structure. For example, we can use the `SUM`

function to calculate the total `Sulphur`

values:

`CALCULATE Totsulph = SUM(Sulphur)`

There are also vector functions that produce summaries across the values of a set of variates (or of scalars). The set of variates must be put into a pointer. So, we could form a variate `M`

each of whose units contains the mean of the values in the corresponding units of the variates `A`

, `B`

and `C`

by

`POINTER [VALUES=A,B,C] Vars`

`CALCULATE M = VMEAN(Vars)`

This can be done more succinctly using an unnamed pointer:

`CALCULATE M = VMEAN(!p(A,B,C))`

When a function has more than one argument, each is separated from the next by a semicolon. For example

`CALCULATE Corr = CORRELATION(X; Y)`

calculates the correlation between the values in `X`

and `Y`

.

Function arguments can also be lists, running in parallel with the other lists in the expression. For example, to calculate `Corr1`

as the correlation between `X1`

and `Y1`

, and `Cor2`

as the correlation between `X2`

and `Y2`

:

`CALCULATE Corr1,Corr2 = CORRELATION(X1,X2; Y1,Y2)`

When a factor occurs in an expression on the right-hand side, Genstat usually works with its levels. The exception is when the factor occurs as the first operand of the operators `.IN.`

or `.NI.`

and the second operand is a text; the factor labels are then used instead. A factor can also occur on the left-hand side of an expression and receive the results of a calculation; an error is reported if any of the resulting values is not one of the levels of the factor. Two functions are provided especially for factors: `NLEVELS(F)`

gives the number of levels of the factor `F`

, and `NEWLEVELS(F; V)`

forms a variate from the factor `F`

, using variate `V`

to define values for the levels.

Text structures are allowed only with the relational operators `.EQS.`

, `.NES.`

, `.IN.`

and `.NI.`

or in the string functions. The result of any expression is a number, so you cannot create a text with `CALCULATE`

, even if the structures on which the operations are being done are texts.

All the arithmetic, relational and logical operators and transformation functions can also be used with matrix structures, symmetric matrices and diagonal matrices. The basic rule when using these with different types of matrix is that their dimensions must conform. This means that, for each pair of matrices, row dimension must match row dimension, and column dimension must match column dimension. So, for example, you can add a diagonal matrix to a matrix structure provided the number of rows and columns of the matrix equals the number of rows (and columns) of the diagonal matrix. The multiplication operator (`*`

) performs element-by-element multiplication of two matrices: for matrix multiplication, there is the compound operator *+ or the function `PRODUCT`

, which is one of the many specialised matrix functions.

You can use tables in expressions in much the same way as you would any other numerical structure. Tables in expressions must be either all without margins or all with margins. If you try to mix tables with and without margins, Genstat will report an error. Calculations with tables are very straightforward when they have the same factors in their classifying sets. The tables then have identical “shapes”, and the arithmetic, relational, and logical operators and the transformation functions act element-by-element, in the usual way. When tables have different classifying sets, there are two cases to consider. The first case is when the table on the left-hand side has a factor in its classifying set that is not in the classifying set of the table on the right-hand side. In this case, the right-hand table is expanded to include that factor, by duplicating its values across the levels of the factor and any margin. The second case is when the table on the right-hand side has a factor in its classifying set that is not in the classifying set of the table on the left-hand side. Now the values in the margin over that factor are taken for the left-hand table. If the table has no margins, they must be calculated first. By default Genstat forms marginal totals, but you can use the special table functions to form other types of margin.

Dummies can be used with the relational operators `.IS.`

and `.ISNT.`

which test whether or not a dummy points to a particular identifier. For example, to store in `Sca`

the result of a test to check whether dummy `D`

points to `Va`

, you would put

`CALCULATE Sca = D.IS.Va`

while to test that `D`

does not point to `Vb`

, you would put

`CALCULATE Sca = D.ISNT.Vb`

There are also the functions `SET`

and `UNSET`

to test if a dummy has or has not been set to any value. Other specialised functions include subset functions, statistical functions and random number generation functions.

`CALCULATE`

has four options: `PRINT`

, `ZDZ`

, `TOLERANCE`

and `SEED`

. If you set the `PRINT`

option to `summary`

, Genstat will print some summary information every time that values are assigned to a structure. The information has the same form as in the `READ`

directive: identifier, minimum value, mean value, maximum value, number of values, number of missing values, and whether or not the set of values is skew.

If you try to use `CALCULATE`

to do something invalid, such as the logarithm or the square root of a negative number, Genstat generates a warning diagnostic and inserts a missing value in the offending unit. The one exception is the division of zero by zero, which is regarded as deliberate. Genstat thus does not print a diagnostic, but uses option `ZDZ`

to determine whether the result should be a missing value (`ZDZ=missing`

) or zero (`ZDZ=zero`

); the default is `missing`

.

The `SEED`

option provides the seed to generate random numbers for the functions `GRBETA`

, `GRBINOMIAL`

, `GRCHISQUARE`

, `GRF`

, `GRGAMMA`

, `GRHYPERGEOMETRIC`

, `GRLOGNORMAL`

, `GRNORMAL`

, `GRPOISSON`

, `GRT`

and `GRUNIFORM`

if these occur in the expression. The seed can be any non-negative integer, but only the last six digits of its integer part are used. Thus the seeds 2144556 and 7144556.3 are both equivalent to the seed 144556. The default value of zero continues an existing sequence of random numbers, if either these functions or the function `URAND`

(which has its own argument to set the seed) has already been used in the current Genstat run. If, however, this is the first time that these functions have been used, Genstat picks a random seed.

The `RESTRICTEDUNITS`

option allows you to apply a “restriction” to the vectors in the expression. Its setting is a variate containing a list of the units numbers on which you want the calculation to be done (the other units are then ignored). This works in the same way as if you had applied a restriction on one of these vectors explicitly, using the `RESTRICT`

directive (see below). However, if `RESTRICTEDUNITS`

is set, restrictions on the vectors themselves are ignored. By default, when `RESTRICTEDUNITS`

is unset, `CALCULATE`

will look for restrictions in the vectors, as usual. Note, though, that you can set `RESTRICTEDUNITS=*`

to make the calculation work on all the units, regardless of whether any of the vectors is restricted.

Options: `PRINT`

, `ZDZ`

, `TOLERANCE`

, `SEED`

, `INDEX`

.

Parameter: unnamed.

### Action with `RESTRICT`

If you are calculating values for a variate or factor, you can restrict the operation to only a subset of the units by applying a restriction to any of the variates, factors or texts involved in that calculation. The values in the other units are left unchanged. If more than one of these vectors is restricted, they must all be restricted in the same way. Note, though, that restrictions on a variate within a scalar function (for example `MEAN`

), or within the `RESTRICTION`

function, operate independently from the main calculation outside. Also, restrictions in the main calculation are ignored if it contains qualified identifiers or the `ELEMENTS`

function.

### See also

Directives: `EXPRESSION`

, `SETCALCULATE`

, `NAG`

, `FLRV`

, `QRD`

, `SVD`

.

Commands for: Calculations and manipulation.

### Example

" Example 1:4.1.1a " VARIATE [VALUES=10,12,14,16,*,20] X VARIATE [VALUES=4,3,2,1,0,-1] Y CALCULATE Vadd = X + Y & Vsub = X - Y & Vmult = X * Y & Vdiv = X / Y & Vexp = X ** Y PRINT X,Y,Vadd,Vsub,Vmult,Vdiv,Vexp; FIELDWIDTH=9; DECIMALS=2 ENDJOB