TXFIND directive

Finds a subtext within a text structure.

Options

`CASE` = string token	Whether to treat the case of letters (small or capital) as significant when searching for the `SUBTEXT` within the `TEXT` (`significant`, `ignored`); default `sign`
`REVERSE` = string token	Whether to reverse the search to work from the end of the `TEXT` (`yes`, `no`); default `no`
`MULTISPACES` = string token	Whether to treat differences between multiple spaces and single spaces as significant, or to treat them all like a single space (`significant`, `ignored`); default `sign`
`DISTINCT` = string tokens	Whether to require the `SUBTEXT` to have one or more separators to its left or right within the `TEXT` (`left`, `right`); default `*`
`SEPARATOR` = string	Characters to use as separators; default `' ,;:.'`
`SAMELINE` = string token	Whether to ignore matches in the `TEXT` where the `SUBTEXT` is not all on the same line (`yes`, `no`); default `no`

Parameters

`TEXT` = texts	Texts to be searched
`SUBTEXT` = texts	Text to look for in each `TEXT`
`COLUMN` = scalars	Position of the column within `TEXT` where the first character of `SUBTEXT` has been found
`LINE` = scalars	Number of the line within `TEXT` where the first character of `SUBTEXT` has been found
`ICOLUMN` = scalars	Column within `TEXT` at which to start the search
`ILINE` = scalars	Line within `TEXT` at which to start the search
`ENDCOLUMN` = scalars	Position of the column within `TEXT` where the last character of `SUBTEXT` has been found
`ENDLINE` = scalars	Number of the line within `TEXT` where the last character of `SUBTEXT` has been found

Description

The TXFIND directive looks for a Genstat text structure within another text structure. The text to search is specified by the TEXT parameter, and the SUBTEXT parameter specifies the text to be found. The search treats the two texts as if they were paragraphs of characters: that is, it takes no account of the line breaks within the two text structures, replacing each one with a space. The COLUMN parameter saves the column within the TEXT where the first character of the SUBTEXT is found, and the LINE parameter saves its line within the TEXT. These are both set to zero if SUBTEXT is not found. Similarly the ENDCOLUMN and ENDLINE parameters save the position of the last character of the SUBTEXT. You can use the ICOLUMN and ILINE parameters to specify a starting column and line for the search. So you can search for the next occurrence of SUBTEXT by setting ILINE to the saved value of LINE, and ICOLUMN to the saved value of COLUMN plus one.

TXFIND usually takes account of the case of letters (small or capital) when looking for the SUBTEXT within the TEXT. So for example 'Genstat' would not match with 'Genstat'. However, you can set option CASE=ignored to ignore differences in case. It will usually also treat multiple spaces as significant, but you can set option MULTISPACE=ignored to treat them all like a single space.

Option DISTINCT is useful if you are looking for distinct words or phrases. The left setting requires the SUBTEXT to begin either at the start of the TEXT, or to be preceded in the TEXT by a separator (such as a space or comma). Similarly, the right setting requires the SUBTEXT to end within the TEXT with a separator (or to be at the end of the TEXT). The separators are specified by the SEPARATOR option.

By default, the SUBTEXT can be split over several lines of the TEXT, but you can set option SAMELINE=yes to ensure that it will be recognised only if it is all on a single line.

Options: CASE, REVERSE, MULTISPACES, DISTINCT, SEPARATOR, SAMELINE.

Parameters: TEXT, SUBTEXT, COLUMN, LINE, ICOLUMN, ILINE, ENDCOLUMN, ENDLINE.

Action with `RESTRICT`

Any restrictions are ignored.

Example

" Example 1:4.7.3, 1:4.7.4 and 1:4.7.6"
TEXT Intro6; VALUES=!t(\
'Genstat has very comprehensive facilities for Analysis of Variance.',\
'Almost all of these can be accessed using custom menus. In this',\
'chapter, we start with the simplest design, a one-way completely',\
'randomized experiment, before introducing factorial experiments,',\
'which have more than one treatment or fixed effect. We use an',\
'experiment with a randomized block design to show how to deal with',\
'blocks, which involve more than one stratum or source of error in',\
'the analysis, and extend this idea by analysing a split-plot design.',\
'Many other types of design can also be analysed by Genstat, and',\
'details are available in Chapter 4 of Part 2 of the Guide to',\
'Genstat. We also introduce some of Genstat''s extensive facilities',\
'for creating designed experiments, available from the Design option',\
'of the Stats menu.')
TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Where
TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Next; SKIP=Where
PRINT      Where,Next; DECIMALS=0
TXFIND     [DISTINCT=left,right] Intro6; SUBTEXT='the';\
           COLUMN=column; LINE=line
PRINT      [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column
FOR [NTIMES=999]
  TXFIND   [DISTINCT=left,right] Intro6; SUBTEXT='the';\
           COLUMN=column; LINE=line; ICOLUMN=column+1; ILINE=line
  EXIT     line .EQ. 0
  PRINT    [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column
ENDFOR
TXBREAK  Intro6; WORDS=Words
GROUP    [CASE=ignored; REDEFINE=yes] Words
TABULATE [PRINT=count; classification=Words]

Updated on June 17, 2019

Was this article helpful?

Yes No