Calculates numerical values for data structures.
|Printed output required (
||Value to be given to zero divided by zero (
||If the scalar is non missing, this defines the smallest non-zero number; otherwise it accesses the default value, which is defined automatically for the computer concerned|
||Seed to use for any random number generation during the calculation; default 0|
||If the calculation has a list of structures before the assignment operator (
||Defines a “restriction” on the vectors in the expression; if this is set the calculations on those vectors will take place only on the units listed in the variate (and any restrictions of their own will be ignored)|
|expression||Expression defining the calculations to be performed|
CALCULATE directive allows you to perform transformations and other calculations. It has the form:
The expression specifies what calculation is to be done, and where the results are to be stored. For example, the command
CALCULATE Area = Length * Breadth
specifies that the structure
Area is to store the results of
Length multiplied by
Breadth. All the usual arithmetic operators are available:
||exponentiation (for example,
CALCULATE can operate on any numerical data structure and it will automatically declare the structure to hold the results if you have not declared it already. So, if
Area has not yet been defined and
Breadth are scalars,
Area will become a scalar too.
Generally the structures involved in the calculation must have the same “shape” (for example, variates must have the same length) and the operators operate element-by-element over all their values. So, if
Breadth were variates,
Area would become a variate each of whose units contained the product of the corresponding units of
Breadth. However, scalars and ordinary numbers can be included with calculations on any type of data structure. So
CALCULATE Kilo = Pound / 2.2
would be valid whatever the type of the structures
If any of the values involved in a numerical expression is missing, the result will be missing too.
Genstat has operators for relational tests:
||equality of numerical values|
||equality of textual strings|
||greater than or equal to|
||less than or equal to|
||not equal to|
||inequality of textual strings|
||identifier equivalence (to test whether a dummy contains a particular identifier)|
||non-inclusion: the opposite of
These generate a result of zero if the test is false, and one if it is true. (In fact any non-zero value is taken to represent a true value.) With most of these operators, a missing value in either operand (or in both) will generate a missing result. The exceptions are
.NE. (and their synonyms), and
.NES.: when both operands are missing
.EQS. give a true result, while
.NES. give a false result.
There are also logical operators that can be used to combine the results of expressions involving relational operators.
The precedence rules of the operators are very similar (but possibly not identical) to those in computer languages like C or Fortran. The list below shows the order in which the operators are evaluated when they are used in expressions, if brackets are not used to make the order explicit:
.IS. .ISNT. .IN. .NI. *+
== = /= .LT. .GT. .EQ. .LE. .GE. .NE. .NES.
.AND. .OR. .EOR.
(Monadic minus means the use of the minus sign in a negative number: for example, -1.) Within each class, operations are done from left to right within an expression, unless brackets are used to indicate some other order. So
A > B+C/D*E
is the same as
A > ( B + ( (C/D) * E )
Expressions can contain lists, to specify that the same calculation is to be done for several sets of structures. For example
CALCULATE Pay1,Pay2 = Hours1,Hours2 * Rate + Bonus
This has the same effect as the two commands
CALCULATE Pay1 = Hours1 * Rate + Bonus
CALCULATE Pay2 = Hours2 * Rate + Bonus
Notice that, if any of the lists on the right-hand side of the expression is shorter than the list on the left-hand side, the list is re-used. So the value of
Bonus is used for both calculations. To take a more complicated example
CALCULATE X,Y,Z = A,B,C + 1,2
is the same as the three calculations
CALCULATE X = A + 1
CALCULATE Y = B + 2
CALCULATE Z = C + 1
However, the lists on the right-hand side must not be longer than the list on the left-hand side.
When the calculation contains lists, you can set the
INDEX option to a scalar which will contain the index of the current calculation. For example
CALCULATE [INDEX=i] X,Y,X = i * A,B,C
is the same as the three calculations
CALCULATE X = 1 * A
CALCULATE Y = 2 * B
CALCULATE Z = 3 * C
A are the first items of their lists,
B are the second, and
C are the third.
Genstat provides a wide range of functions for use in expressions. Many of these, known as transformations, produce a result that is the same type of structure as the argument of the function. For example,
CALCULATE Logsulph = LOG(Sulphur)
LOG function to take natural logarithms of the values in the data structure
Sulphur is a variate
Logsulph will also be a variate with the same number of values.
CALCULATE Totsulph = SUM(Sulphur)
There are also vector functions that produce summaries across the values of a set of variates (or of scalars). The set of variates must be put into a pointer. So, we could form a variate
M each of whose units contains the mean of the values in the corresponding units of the variates
POINTER [VALUES=A,B,C] Vars
CALCULATE M = VMEAN(Vars)
This can be done more succinctly using an unnamed pointer:
CALCULATE M = VMEAN(!p(A,B,C))
When a function has more than one argument, each is separated from the next by a semicolon. For example
CALCULATE Corr = CORRELATION(X; Y)
calculates the correlation between the values in
Function arguments can also be lists, running in parallel with the other lists in the expression. For example, to calculate
Corr1 as the correlation between
Cor2 as the correlation between
CALCULATE Corr1,Corr2 = CORRELATION(X1,X2; Y1,Y2)
When a factor occurs in an expression on the right-hand side, Genstat usually works with its levels. The exception is when the factor occurs as the first operand of the operators
.NI. and the second operand is a text; the factor labels are then used instead. A factor can also occur on the left-hand side of an expression and receive the results of a calculation; an error is reported if any of the resulting values is not one of the levels of the factor. Two functions are provided especially for factors:
NLEVELS(F) gives the number of levels of the factor
NEWLEVELS(F; V) forms a variate from the factor
F, using variate
V to define values for the levels.
Text structures are allowed only with the relational operators
.NI. or in the string functions. The result of any expression is a number, so you cannot create a text with
CALCULATE, even if the structures on which the operations are being done are texts.
All the arithmetic, relational and logical operators and transformation functions can also be used with matrix structures, symmetric matrices and diagonal matrices. The basic rule when using these with different types of matrix is that their dimensions must conform. This means that, for each pair of matrices, row dimension must match row dimension, and column dimension must match column dimension. So, for example, you can add a diagonal matrix to a matrix structure provided the number of rows and columns of the matrix equals the number of rows (and columns) of the diagonal matrix. The multiplication operator (
*) performs element-by-element multiplication of two matrices: for matrix multiplication, there is the compound operator *+ or the function
PRODUCT, which is one of the many specialised matrix functions.
You can use tables in expressions in much the same way as you would any other numerical structure. Tables in expressions must be either all without margins or all with margins. If you try to mix tables with and without margins, Genstat will report an error. Calculations with tables are very straightforward when they have the same factors in their classifying sets. The tables then have identical “shapes”, and the arithmetic, relational, and logical operators and the transformation functions act element-by-element, in the usual way. When tables have different classifying sets, there are two cases to consider. The first case is when the table on the left-hand side has a factor in its classifying set that is not in the classifying set of the table on the right-hand side. In this case, the right-hand table is expanded to include that factor, by duplicating its values across the levels of the factor and any margin. The second case is when the table on the right-hand side has a factor in its classifying set that is not in the classifying set of the table on the left-hand side. Now the values in the margin over that factor are taken for the left-hand table. If the table has no margins, they must be calculated first. By default Genstat forms marginal totals, but you can use the special table functions to form other types of margin.
Dummies can be used with the relational operators
.ISNT. which test whether or not a dummy points to a particular identifier. For example, to store in
Sca the result of a test to check whether dummy
D points to
Va, you would put
CALCULATE Sca = D.IS.Va
while to test that
D does not point to
Vb, you would put
CALCULATE Sca = D.ISNT.Vb
There are also the functions
UNSET to test if a dummy has or has not been set to any value. Other specialised functions include subset functions, statistical functions and random number generation functions.
CALCULATE has four options:
SEED. If you set the
summary, Genstat will print some summary information every time that values are assigned to a structure. The information has the same form as in the
READ directive: identifier, minimum value, mean value, maximum value, number of values, number of missing values, and whether or not the set of values is skew.
If you try to use
CALCULATE to do something invalid, such as the logarithm or the square root of a negative number, Genstat generates a warning diagnostic and inserts a missing value in the offending unit. The one exception is the division of zero by zero, which is regarded as deliberate. Genstat thus does not print a diagnostic, but uses option
ZDZ to determine whether the result should be a missing value (
ZDZ=missing) or zero (
ZDZ=zero); the default is
SEED option provides the seed to generate random numbers for the functions
GRUNIFORM if these occur in the expression. The seed can be any non-negative integer, but only the last six digits of its integer part are used. Thus the seeds 2144556 and 7144556.3 are both equivalent to the seed 144556. The default value of zero continues an existing sequence of random numbers, if either these functions or the function
URAND (which has its own argument to set the seed) has already been used in the current Genstat run. If, however, this is the first time that these functions have been used, Genstat picks a random seed.
RESTRICTEDUNITS option allows you to apply a “restriction” to the vectors in the expression. Its setting is a variate containing a list of the units numbers on which you want the calculation to be done (the other units are then ignored). This works in the same way as if you had applied a restriction on one of these vectors explicitly, using the
RESTRICT directive (see below). However, if
RESTRICTEDUNITS is set, restrictions on the vectors themselves are ignored. By default, when
RESTRICTEDUNITS is unset,
CALCULATE will look for restrictions in the vectors, as usual. Note, though, that you can set
RESTRICTEDUNITS=* to make the calculation work on all the units, regardless of whether any of the vectors is restricted.
If you are calculating values for a variate or factor, you can restrict the operation to only a subset of the units by applying a restriction to any of the variates, factors or texts involved in that calculation. The values in the other units are left unchanged. If more than one of these vectors is restricted, they must all be restricted in the same way. Note, though, that restrictions on a variate within a scalar function (for example
MEAN), or within the
RESTRICTION function, operate independently from the main calculation outside. Also, restrictions in the main calculation are ignored if it contains qualified identifiers or the
Commands for: Calculations and manipulation.
" Example 1:4.1.1a " VARIATE [VALUES=10,12,14,16,*,20] X VARIATE [VALUES=4,3,2,1,0,-1] Y CALCULATE Vadd = X + Y & Vsub = X - Y & Vmult = X * Y & Vdiv = X / Y & Vexp = X ** Y PRINT X,Y,Vadd,Vsub,Vmult,Vdiv,Vexp; FIELDWIDTH=9; DECIMALS=2 ENDJOB