1. Home
  2. READ directive

READ directive

Reads data from an input file, an unformatted file or a text.

Options

PRINT = string tokens What to print (data, errors, summary); default erro, summ
CHANNEL = identifier Channel number of file, or text structure from which to read data; default current file
SERIAL = string token Whether structures are in serial order, i.e. all values of the first structure, then all of the second, and so on (yes, no); default no, i.e. values in parallel
SETNVALUES = string token Whether to set number of values of vectors from the number of values read (yes, no); default no causes the number of values to be set only for structures whose lengths are not defined already (e.g. by declaration or by UNITS)
LAYOUT = string token How values are presented (separated, fixedfield); default sepa
END = text What string terminates data (* means there is no terminator); default ‘:’
SEQUENTIAL = scalar To store the number of units read (negative if terminator is met); default *
ADD = string token Whether to add values to existing values (yes, no); default no (available only in serial read)
MISSING = text What character represents missing values; default '*'
SKIP = scalar Number of characters (LAYOUT=fixe) or values (LAYOUT=sepa) to be skipped between units (* means skip to next record); default 0 (available only in parallel read)
BLANK = string token Interpretation of blank fields with LAYOUT=fixe (missing, zero, error); default miss
JUSTIFIED = string tokens How values are to be assumed justified with LAYOUT=fixe (left, right); default righ
ERRORS = scalar How many errors to allow in the data before reporting a fault rather than a warning, a negative setting, –n, causes reading of data to stop after the nth error; default 0
FORMAT = variate Allows a format to be specified for situations where the layout varies for different units, option SKIP and parameters FIELDWIDTH and SKIP are then ignored (in the variate: 0 switches to fixed format; 0.1, 0.2, 0.3 or 0.4 to free format with space, comma, colon or semi-colon respectively as separators; * skips to the beginning of the next line; in fixed format, a positive integer n indicates an item in a field width of n, –n skips n characters; in free format, n indicates n items, –n skips n items); default *
QUIT = scalar Channel number of file to return to after a fatal error; default * i.e. current input file
UNFORMATTED = string token Whether file is unformatted (yes, no); default no
REWIND = string token Whether to rewind the file before reading (yes, no); default no
SEPARATOR = text Text containing the (single) character to be used in free format; default ' '
SETLEVELS = string token Whether to define factor levels or labels (according to the setting of FREPRESENTATION) automatically from those that occur in the data (yes, no); default no causes them to be set only when they are not defined already
TRUNCATE = string tokens Truncation of leading or trailing spaces of strings read in fixed format (leading, trailing); default * i.e. none
CASE = string token Whether the case of letters (small and capital) should be regarded as significant or ignored when forming factor labels automatically (significant, ignored); default sign
LDIRECTION = string token How to define the ordering of levels or labels when these are formed automatically (ascending, given); default asce

Parameters

STRUCTURE = identifiers Structures into which to read the data
FIELDWIDTH = scalars Field width from which to read values of each structure (LAYOUT=fixe only)
DECIMALS = scalars Number of decimal places for numerical data containing no decimal points
SKIP = scalars Number of values (LAYOUT=sepa) or characters (LAYOUT=fixe) to skip before reading a value
FREPRESENTATION = string tokens How factor values are represented (labels, levels, ordinals); default leve

Description

Data values can be read into any Genstat data structure using the READ directive. In its simplest form, you merely list the structure whose values are to be read: for example

READ Weight

The data values for Weight are then assumed to come on the following line or lines. They are assumed to be in free format, separated one from another by one or more spaces or tabs or new lines, and to be terminated by a colon.

READ has a PRINT option with settings:

    summary to print a summary of the data
    data to print a copy of the input lines
    errors to print a detailed report on any errors in the data

By default PRINT=summary,errors.

The CHANNEL option allows you to read data from another file; this must already have been opened (see the OPEN directive). You can also read data from a Genstat text structure. Each line of input is then treated as if it had been read from a file. Note: you should use CHANNEL if you want to use READ in an IF or CASE structure, a FOR loop or a procedure.

You can read values for more than one structure in a single READ statement. The values can be taken either serially or in parallel. The default is to take the values in parallel: the first element of each structure is read, then the second element of each, until all the data are read. For example:

a1 b1 c1

a2 b2 c2

a3 b3 c3

a4 b4 c4 :

or

a1 b1 c1 a2

b2 c2

a3 b3 c3 a4 b4 c4 :

Here A, B and C are in parallel, each with four values. The complete set of values for all three structures is given, followed by one terminating colon. The term parallel merely indicates the order in which READ is to read the values: that is, the first element of each structure, then the second element of each, and so on. It is not necessary for the data to be laid out in neat columns, although this may make a data file easier to work with. Different types of structures can be read in parallel and they may have different kinds of values (numerical or text).

Alternatively, you can set option SERIAL=yes to read the structures in series. Then all the values of the first structure are read, followed by all the values for the second structure, and so on, until all the data structures have been read. For example

x1 x2 x3 :

y1 y2 :

z1 z2 z3 z4 z5 z6 :

Here all the values of X are given first, followed by all the values for Y, and then all the values for Z. Unlike the parallel layout, each set of values must end with the terminating colon, so that READ can tell when to move on to the next structure; this means that the structures can be of different lengths.

When you are working interactively, Genstat produces a prompt indicating the name of the data structure and the unit number of the next value it expects to read. If Genstat knows how many values to expect, it will terminate the input automatically, without asking for the terminating colon, if the last value is at the end of a line. However, it is quite correct to include the colon at the end of that line of data if you want. If you type too many values by mistake you will get a warning message telling you that the extra data has been ignored.

If a structure whose values are to be read has not already been declared, Genstat will define it automatically as a variate. Likewise, if the length of a vector is undefined, this too will be set automatically. READ first checks whether the vector is being read in parallel with other vectors whose lengths have been defined, then it looks to see if a default length has been defined for vectors using the UNITS directive. If neither of these is available to define the length, it is set to the number of data values that are provided in the input. Lengths of vectors can also be redefined according to the number of data values that are read, by setting option SETNVALUES=yes. The END option allows you to define another string of characters to be used instead of a colon to mark the end of the data, or you can set END=* to indicate that there is no terminating string.

The values of numerical structures (scalars, variates, matrices, symmetric and diagonal matrices and tables) can be entered in any of the standard forms: for example

1.20 -.2 3e1 -1.25E-2 27

are all valid.

Textual values (strings) in free format must be enclosed within single quotes if they contain any characters that have special meaning to READ (space, tab, comma, colon, asterisk, backslash, single or double quote). The quotes can be omitted for other strings. For example:

TEXT [NVALUES=5] Country

READ Country

Australia Canada 'Great Britain' U.S.A. 'New Zealand' :

The rules for strings in READ are thus slightly different to those for lists of strings, where quotes are required for any string that does not start with a letter or contains any character other than letters or digits. Thus Newcastle-on-Tyne and 500Km are both valid when read in as data, but not in a TEXT declaration. Rules for strings in fixed format are described later.

The values of factors are usually represented by their levels. You can change this by setting the FREPRESENTATION parameter. If you set it to labels, READ will accept as values the labels of the factor, using the same rules as for reading textual strings. The strings given as data values must match exactly the labels of the factor if they have been declared. The setting FREPRESENTATION=ordinals causes READ to expect an integer in the range 1 up to n, the number of levels declared for the factor. As FREPRESENTATION is a parameter it can be set to a list of values which are cycled in parallel with the structures to be read. Thus, you are allowed to read several factors in one READ statement, possibly using a different method for reading each one. The setting of this parameter is ignored for any structures that are not factors, but remember that the list will still be cycled in parallel with these other structures.

If you set option SETLEVELS=yes, READ will set up the factor levels or labels according to the values that it finds when reading the data. By default it distinguishes between capital and small letters when forming factor labels, but you can set option CASE=ignored to ignore the case of letters. Also, by default the levels or labels are sorted into ascending order, but you can set option LDIRECTION=given to leave them in the order in which they are found in the data file.

The values of pointers are identifiers, that is, names of other data structures. When reading a pointer only simple identifiers are allowed: suffixes cannot be used. For example, Winston is allowed but Orwell[1984] is not.

You cannot read formulae or expressions directly. The easiest way to do this is to read the required value into a text which can then be used in an appropriate declaration using either the macro-substitution symbols ## or the EXECUTE directive. You cannot read values into compound data structures; these should be formed using the appropriate directives or by reading their components individually.

By default, a missing value should be indicated by an asterisk (*); this means that any data item that begins with * is treated as missing. For example, any of the three strings

* *** *789

will be treated as missing. You can use the MISSING option to change this to any other single character; for example, if you set MISSING='-' then any negative numbers will be read as missing values.

In free format, values are usually separated by spaces or tabs. The SEPARATOR option can be used to specify another character to use as a separator. For example you can use a comma:

READ [SEPARATOR=','] Weights

24.3, 25.6, 57.3, 43.8, 45.3,

46.5, 47.9, 97.0, 77.5, 64.3 :

You can use spaces and tabs in addition to the specified separator, so long as the separator is present between each pair of values (except at the end of line, when it may be omitted).

The SEPARATOR, END and MISSING strings are all case-sensitive; for example, END=enddata is different from END=EndData. The missing-value and separator characters must be distinct and neither may be part of the END string.

In free format, the SKIP option can be used to skip values between complete units of data. For example, with a file in channel 2 containing five columns of data, the statement

READ [CHANNEL=2; SKIP=3] X,Y

would read X and Y from the first two columns, and then skip the final three columns: Genstat reads the first value for X and Y, the next three values are skipped before reading the second value of X; so READ moves onto the next line of the file, and so on. You can also set SKIP=* to skip directly to the next line of data; you could use this if there were varying numbers of additional columns in the file. By default, SKIP is zero, so no values are skipped. The SKIP parameter is interpreted in parallel with the structures whose values are to be read, and indicates how many values should be skipped before reading the value for the corresponding structure.

In fixed format, data values are arranged in specific fields on each line of the file. Each field consists of a fixed number of characters. There is no need for separating spaces; the tab character is not permitted, nor are comments. So, depending on how the fields are defined, the sequence of digits 123456 could be interpreted for example as the single number 123456, or two numbers 123 and 456, or three numbers 123, 4 and 56. Data like this are usually produced by special-purpose programs or equipment; for example, automatic data recorders.

To read data in fixed format you set the LAYOUT option to fixed, and then specify the format to be used. If the values for a structure always occupy the same number of character positions, you can do this with the FIELDWIDTH parameter. For example,

READ [CHANNEL=2; LAYOUT=fixed] Weight,Height; FIELDWIDTH=3,5

takes data from channel 2 in fixed format. The data are in parallel: that is, reading across lines of the file, values for Weight and Height appear alternately. The FIELDWIDTH parameter is processed in parallel with the structures to be read, so each item of Weight data takes up three characters, and each item of Height data takes up five. If the fieldwidth for a structure is not constant, that is if different layouts are used for different units of the data, then you need to use the FORMAT option, described later.

Suppose there are 80 characters per line in the file; each pair of Weight and Height values takes up 8, and so you have 10 pairs per line. The first line looks like:

Weight1Height1Weight2Height2 ... Weight10Height10

Suppose that the first two values for Weight were 1 and 200, and that the first two for Height were 10 and 1200. Then, using to represent a space, the first four items on this line would be:

⊔⊔1⊔⊔⊔102001200

Genstat is able to identify the separate values 10 and 200 because it is reading a fixed number of characters for each structure.

Genstat input files have a nominal width, set by default to 80. This can be altered by an OPEN statement to a different value if necessary. When reading in fixed format, each line of input is taken to be exactly this width; shorter lines are extended with spaces (blanks). It is important to make sure that you account for this when setting the options for READ, otherwise you may read some values from these blank fields (the BLANK option, described below, explains how the blank fields would be interpreted). In the example above, if the values for Height occupied four characters instead of five there would be 11 pairs of values per line of 77 characters. Using the default settings, the final three characters on the first line would be read as the 12th value of Weight, and READ would then be out of step as the 12th value of Height would be read in from the beginning of the next line. The simplest solution is to set the file width to 77 in the OPEN statement, but you can also use the SKIP option and parameter (see below) or the FORMAT option to avoid this sort of problem.

When you are using fixed format, the data terminator must begin within the first field to be read after the final data value: so you must ensure that you set the field widths and position the terminator appropriately. If you are using either the SKIP option or parameter, you must take care not to skip accidentally over the terminator, as READ will continue to take input – and probably generate many error messages.

Normally Genstat treats a blank field in fixed-format data as a missing value, and the only indication will be in the count of missing values in the printed summary. You can request warning messages for blank fields by setting the option BLANK=error. Alternatively, you can cause blanks to be interpreted as zeroes, by setting BLANK=zero.

Data in fixed format are normally taken to be right-justified: that is, their right-hand ends are flush with the right-hand end of the field; you can have either blanks or leading zeroes (for numbers) in the redundant spaces at the left of the field. You can change this default by setting the JUSTIFIED option. For example the value 123 can appear in a field of width 5 as:

    ⊔⊔123 JUSTIFIED=right there may be leading blanks (the default),
    123⊔⊔ JUSTIFIED=left there may be trailing blanks
    00123 JUSTIFIED=left,right there must be no blanks, or
    123 JUSTIFIED=* there may be leading or trailing blanks.

In this way, JUSTIFIED allows you to check the blanks in each field. If a data field contains any blanks that are not allowed by the current setting, an error will be reported. Note that when reading numerical data embedded blanks are never permitted. So a field containing, for example 123, will always produce an error message.

As an example, we can read the values of five scalars using a fixed format with values left-justified in their fields by the following:

SCALAR V,W,X,Y,Z

READ [LAYOUT=fixed;JUSTIFIED=left] V,W,X,Y,Z;\

                        FIELDWIDTH=4,5,7,4,5

1.235.62678.9⊔⊔3.7810.31:

This reads the values 1.23, 5.62, 678.9, 3.78 and 10.31 into V, W, X, Y and Z respectively.

The general principles of the SKIP option and parameter are discussed in the context of a free format read in the previous section. When reading in fixed format the same ideas apply, but the SKIP settings now specify numbers of characters to be ignored, instead of numbers of values. Thus, you can obtain exactly the same effect as in the example above by putting

READ [LAYOUT=fixed] V,W,X,Y,Z; FIELDWIDTH=4,4,5,4,5;\

                    SKIP=0,0,1,2,0

Sometimes fixed format data can be further compressed by omitting the decimal point. The DECIMALS parameter allows you to re-scale data automatically when it is read (in either fixed of free format).

When reading textual data in fixed format, the contents of each field are taken exactly as they appear in the input file. There is no need to enclose values in quotes; in fact if you do so, the quotes are treated as part of the data. For example,

TEXT [NVALUES=1] T1,T2,T3,T4

READ [LAYOUT=fixed; SKIP=*] T1,T2,T3,T4; FIELDWIDTH=6,3,4,7

'What'sitallabout?':

gives text T1 the value 'What's, text T2 the value it, text T3 the value all, and text T4 the value about?'. Consequently, the only way to represent a missing string in fixed format is by a blank field, as '' or * would both be treated literally and stored as data values.

The TRUNCATE option has settings leading and trailing, allowing you to remove initial or trailing spaces in strings that are read in fixed format. For example, if we set TRUNCATE=leading above, T2 would just contain the two letters it. By default no truncation takes place.

The rules for reading textual data in fixed format also affect the reading of factors. If you set FREPRESENTATION=labels and do not request any truncation, the width of the field must equal the number of characters in the label, as for example no is not the same as no.

The FORMAT option allows you to use use a variable format. By this we mean that the layout of the values may vary from unit to unit of the data, and may also vary within each unit. For example, suppose you have some meteorological data which was measured daily and that the file also contains some additional summary values at the end of each week. The first eleven lines are reproduced to illustrate the structure of the file:

Monday 5.5 -0.4 0.0 1.9 10.0

Tuesday -1.1 -2.1 0.0 0.0 34.0

Wednesday 0.6 -8.3 1.3 5.4 142.0

Thursday 6.8 -5.7 1.1 0.0 158.0

Friday 10.6 0.5 8.1 0.0 141.0

Saturday 10.7 6.4 8.3 0.0 152.0

Sunday 10.0 1.9 1.0 0.1 237.0

Summary week 1> 10.7 -8.3 4 19.8 7.4 10.0 124.8 237.0

Monday 9.9 2.5 0.0 4.4 229.0

Tuesday 11.4 2.1 8.5 0.3 237.0

Wednesday 11.9 6.3 18.7 0.0 520.0

Suppose the file contains data for 28 days. If you try to read a text and five variates of length 28 then the summaries found after the 7th, 14th, 21st and 28th days would cause an error in READ. You need to read seven lines, skip one, read seven more, and so on. This can be done by setting the option FORMAT=!( (6)7,*,* ). This means “read six values, do this seven times, skip to the next line, skip again, then return to the beginning of the format and repeat, until enough data has been read”. The format is made clear by using (6)7 which corresponds to the physical layout of the data, but 42 could have been specified instead, meaning read the next 42 values.

You can use FORMAT when reading in either free format or fixed format, and can also switch between the two during the READ. When you have set FORMAT, Genstat ignores the SKIP option and the FIELDWIDTH and SKIP parameters, and READ is controlled entirely by the values of the FORMAT. These values are not in parallel with the list of structures: they apply to data values in turn, recycling from the beginning when necessary. You set FORMAT to a variate, which may be declared in advance or can be an unnamed structure as shown above. Each value of this variate is interpreted as follows (where n is a positive integer):

+n  read n values (in free format) or one value from a field of n characters (in fixed format);

-n   skip the next n values (in free format) or n characters (in fixed format)

*     skip to the beginning of the next line

0.0 switch to fixed format

0.1 switch to free format using space as a separator

0.2 switch to free format using comma as a separator

0.3 switch to free format using colon as a separator

0.4 switch to free format using semicolon as a separator

0.5 switch to free format using the setting of the SEPARATOR option

Using the FORMAT variate READ will start in either free format or fixed format, according to the setting of LAYOUT (by default, LAYOUT=separated; that is, free format). You can switch between these at any time by specifying a value in the range 0-0.5. Remember that if you use free format, spaces and tabs can also be used in addition to the specified separator, and you must use a separator that is distinct from the END and MISSING indicators.

You can read from unformatted files by setting option UNFORMATTED=yes. The only options that are then relevant are CHANNEL, REWIND and SERIAL. Details of how to create the unformatted files are given in the description of the PRINT directive.

If you have more data to read than can be stored in the space available within Genstat, you can use the SEQUENTIAL option of READ to process the data in smaller batches. This works by reading in some of the data, partially processing it to form an intermediate result, and then overwriting the original data with a new batch that is used to update the intermediate results. This can be repeated until all the data has been read and the final summary is obtained. There are two directives that include facilities specifically designed to work with sequential data input: TABULATE which forms tabular summaries, and FSSPM which forms SSPM data structures for use in linear regression. You can also use other directives, such as CALCULATE, to process data sequentially, but you will have to program the sequential aspects yourself.

You should first declare the structures to be of some convenient size, such that you will not use up all the work space. You then use READ as normal, but with the SEQUENTIAL option set to the identifier of a scalar, which will be used to keep track of how the input is progressing. For example, to read in 10 variates of length 272500:

VARIATE [NVALUES=10000] X[1...10]

READ [CHANNEL=2; SEQUENTIAL=N] [1...10]

The number of values declared for X[1...10] defines the size of batch to read (10000 in this example). So, READ will read the first 10000 units of data (100,000 values), and set N to 10000 to indicate that is the number of units read. This should be followed by the statements to process the first batch of data, then the READ can be repeated. Once again N is set to 10000, indicating that another 10000 units have been read. This can be continued until READ finds the data terminator, when it sets the sequential indicator to minus the number of values found in the last batch. If this is less than the declared size of the data structures they will be filled out with missing values. In the example given above, after the 28th READ the variates will each contain 2500 values followed by 7500 missing values, and N will be set to -2500, indicating that all the data has been read and that the final batch contains only 2500 values. Usually you will use the SEQUENTIAL facility in conjunction with FSSPM or TABULATE which are designed to recognize the different settings of the scalar N.

The SEQUENTIAL option is best used within a FOR loop. You should set the NTIMES option to a value large enough to ensure that sufficient batches of data are read. The loop should contain the READ statement and any other statements required to process the data. For example

VARIATE [NVALUES=10000] X[1...10]

SSPM [TERMS=X[]] S

FOR [NTIMES=9999]

  READ [PRINT=*;CHANNEL=2;SEQUENTIAL=N] X[]

  FSSPM [SEQUENTIAL=N] S

  EXIT N.LE.0

ENDFOR

The EXIT directive is used to jump out of the loop once all the data has been read and processed; this is safer than trying to program an exact number of iterations for the loop. The exit condition includes the case when N is equal to zero, as this will arise when the batch size exactly divides the total number of units. In the above example, if there were 280000 units of data altogether, the 28th READ would terminate with N set to 10000. This is because READ is unable to look ahead for the terminator, as there may be other statements in the loop, such as SKIP, which affect how the file is read. The next READ would immediately find the data terminator, so would exit with N set to zero. This special case is treated appropriately by FSSPM and TABULATE, but you should remember to allow for it if you are programming the sequential processing explicitly.

You can use the SEQUENTIAL option to read data from more than one input channel, perhaps when a large data set is split into two or more files, but you are not allowed to read data from the current input channel (that is, the channel containing the READ statement). If you want to process several structures sequentially from the same file, you must read them in parallel. You must also be careful not to modify the value of the scalar, N, within the loop when using sequential data input with FSSPM or TABULATE, as that could interfere with the sequential processing.

Another means of handling large amounts of data is provided by the ADD option. This allows you to add values to those already stored in a structure, thus forming cumulative totals without having to store all the individual data values. You must set SERIAL=yes with ADD=yes; and it is allowed only for variates. For example:

VARIATE [NVALUES=6] A

READ [ADD=yes; SERIAL=yes] 3(A)

5 12 9 * * 9 :

8 1 3 * 2 10 :

3 4 0 * 11 * :

This starts by assigning the values 5, 12, 9, *, *, and 9 to A. Then A is read again, and its values become 13, 13, 12, *, 2, 19: with ADD=yes (and only then) missing values are interpreted as zeroes when being added to non-missing values. Finally A contains the values 16, 17, 12, *, 13, 19.

If you have used the UNITS directive to specify a variate or text containing unit labels, READ will respect the order of these values when reading other structures in parallel with the units structure; in other words the data are re-ordered to match the order of the unit labels. If the units structure does not already have values, READ will define order of the units as the order in which it finds them in the data. This means that if you are reading several sets of data, each having a column for the unit number (or label), the first use of READ will define the unit order and subsequent READ statements will ensure that this order is maintained consistently in the remaining data. If a value is specified more than once when defining the units structure, READ will only ever locate the first occurrence of that unit label. If a unit label is repeated in the data then only the final set of values corresponding to that unit will be stored; earlier occurrences are overwritten by subsequent ones. If you try to read a value that is not present in the units structure this is regarded as a fault. Also, if the units structure contains missing values it cannot be used to re-order the data and will instead be overwritten by the new values: a warning message is printed out to tell you if this occurs. If you use the option SETNVALUES=yes when reading structures in parallel with the units vector, the other structures will all be set to the current unit length.

When you are working interactively and typing data from the keyboard, READ will halt immediately it finds an invalid value. You should type the correct value and then continue with the rest of the data. If you had typed several items of data then all those before the erroneous value will have been read and stored, but any remaining values will have been discarded, and so will need to be retyped. When you are reading data in batch, it is not possible to recover from errors in this way. Instead, READ will continue processing the data, substituting missing values for any data that it cannot read, and printing out a message for every error that is found.

If errors occur when running in batch, a fault will be generated when READ terminates, thus terminating the job. This is to avoid spurious output being produced from analyses based on incorrect data. You can override this by using the options ERRORS and QUIT. If you set ERRORS=n, where n is a positive integer, then up to n errors are allowed in the data before READ generates a fault. You might want to do this if you knew certain items of data were going to generate errors, but were prepared to accept them as missing values so that you could analyse the rest of the data. Obviously, you need to be very careful when doing this, as there may be other unexpected errors in the data. Usually you would have to try reading the data once without setting ERRORS, so you could check all the messages, and find what value of n is appropriate. Then the READ statement would have to be repeated, setting ERRORS and REWIND in order to read the data. For example, if missing values of a factor had been typed in as the letter X, you would not want to define X as an extra level of the factor, but if you set MISSING='X' any numerical data that used * for missing value could not be read either.

READ produces a message for every data value that contains an error. This can be very useful, as you then have the opportunity to correct all the errors at once, before trying to read the data again. However, the error messages may not be due to errors in the data, but may be caused by an incorrectly specified READ statement. For example, if you are reading many structures in parallel and specify texts and variates in the wrong order in the list of structures to be read, you will get an error message every time Genstat finds a piece of text rather than a number in the position specified for a variate. This is not likely to be a problem, unless you are reading large amounts of data, when you might end up with thousands of lines of needless error messages. A sensible precaution then is to request Genstat to abort the READ if more than a specified number of errors occur. You can do this by setting ERRORS to a negative integer, -n. This means that up to n errors are allowed in the data, but READ will abort if any more occur, switching control to the channel specified by QUIT (that is, starting or continuing to read Genstat statements from that channel). If you are working in batch a fault will be generated that inhibits execution of further statements, but interactively you have the opportunity to examine the data that have been read in so far, which may help identify any problems in the original READ statement or declarations of your data.

Options: PRINT, CHANNEL, SERIAL, SETNVALUES, LAYOUT, END, SEQUENTIAL, ADD, MISSING, SKIP, BLANK, JUSTIFIED, ERRORS, FORMAT, QUIT, UNFORMATTED, REWIND, SEPARATOR, SETLEVELS, TRUNCATE, CASE, LDIRECTION.

Parameters: STRUCTURE, FIELDWIDTH, DECIMALS, SKIP, FREPRESENTATION.

Action with RESTRICT

READ ignores any restrictions.

See also

Directives: OPEN, COPY, RETRIEVE, SKIP, SPLOAD.
Procedures: FILEREAD, IMPORT, DBIMPORT, TX2VARIATE.
Commands for: Input and output.

Example

" Example READ-1: Reading parallel free-format data"

" Open a data file on the second channel for input."
OPEN '%gendir%/examples/READ-1.DAT'; CHANNEL=2; FILETYPE=input

" Ignore the first three lines, and then read values of six variates,
  recorded in parallel (the default) in free format  -  allowing one error,
  (known to be harmless), which READ reports in a warning."
SKIP [CHANNEL=2] 3
READ [CHANNEL=2; PRINT=data,error; ERROR=1] QQ[1...6]
PRINT QQ[]

" If you continue reading from the same file attached to a channel,
  you can read further data recorded after the first end-of-data marker."
READ [CHANNEL=2; PRINT=data,error] ZZ
PRINT ZZ

" It is good practice to close files after you have finished with them."
CLOSE 2

" The data can be recorded in the file with the commands:
  in this case, it must follow the READ command, or the end
  of a FOR loop if READ is in a loop, or the invocation of
  a procedure is the READ command is in a procedure."

TEXT Text
READ [PRINT=data,errors] Numbers,Text
23 Apples
22 Pears
31 Oranges
10 Bananas
4 Peaches :
PRINT Numbers,Text
Updated on September 2, 2019

Was this article helpful?