Breaks up a text structure into individual words.
Option
SEPARATOR = text |
Defines the characters separating the words in the original text; default ' ,;:.' |
---|
Parameters
TEXT = texts |
Text to break into words |
---|---|
WORDS = texts |
Saves the words contained in each text (in the order in which they occur) |
COLUMNS = variates |
Saves the number of the column in the TEXT where each word began |
LINES = variates |
Saves the number of the line where each word was found |
PLACESINLINES = variates |
Saves the place of each word (first, second &c) within the line where it was found |
Description
The TXBREAK
directive forms a text containing all the words (including duplicates) found in a text. The original text to break up is supplied by the TEXT
parameter, and the WORDS
parameter saves a text storing the words that it contains. The words are stored in the order in which they occur in the original text (but, for example, you could use the SORT
directive to sort them into alphabetic order). The LINES
parameter can save a variate recording the line in the original text where each one was found. The COLUMNS
parameter can save a variate recording the column where each word began, and the PLACESINLINES
parameter can save a variate giving the place of each word (first, second &c) within the line where it was found.
By default, the words are assumed to be separated from one another by spaces or by any of the standard punctuation characters (comma, semi-colon, colon, full stop). However, you can use the SEPARATOR
option to specify some other characters. For example, you could put SEPARATOR=' ,;:.?'
to allow question marks as well. These characters are all removed from the words when they are stored.
Option: SEPARATOR
.
Parameters: TEXT
, WORDS
, COLUMNS
, LINES
, PLACESINLINES
.
Action with RESTRICT
TXBREAK
takes account of any restrictions on the original text, and omits the words in the restricted lines.
See also
Directives: TEXT
, CONCATENATE
, EDIT
, TXCONSTRUCT
, TXFIND
, TXPOSITION
, TXREPLACE
.
Procedure: TXSPLIT
.
Functions: CHARACTERS
, GETFIRST
, GETLAST
, GETPOSITION
, POSITION
.
Commands for: Calculations and manipulation.
Example
"Example 1:4.7.3, 1:4.7.4 and 1:4.7.6" TEXT Intro6; VALUES=!t(\ 'Genstat has very comprehensive facilities for Analysis of Variance.',\ 'Almost all of these can be accessed using custom menus. In this',\ 'chapter, we start with the simplest design, a one-way completely',\ 'randomized experiment, before introducing factorial experiments,',\ 'which have more than one treatment or fixed effect. We use an',\ 'experiment with a randomized block design to show how to deal with',\ 'blocks, which involve more than one stratum or source of error in',\ 'the analysis, and extend this idea by analysing a split-plot design.',\ 'Many other types of design can also be analysed by Genstat, and',\ 'details are available in Chapter 4 of Part 2 of the Guide to',\ 'Genstat. We also introduce some of Genstat''s extensive facilities',\ 'for creating designed experiments, available from the Design option',\ 'of the Stats menu.') TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Where TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Next; SKIP=Where PRINT Where,Next; DECIMALS=0 TXFIND [DISTINCT=left,right] Intro6; SUBTEXT='the';\ COLUMN=column; LINE=line PRINT [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column FOR [NTIMES=999] TXFIND [DISTINCT=left,right] Intro6; SUBTEXT='the';\ COLUMN=column; LINE=line; ICOLUMN=column+1; ILINE=line EXIT line .EQ. 0 PRINT [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column ENDFOR TXBREAK Intro6; WORDS=Words GROUP [CASE=ignored; REDEFINE=yes] Words TABULATE [PRINT=count; classification=Words]