1. Home
  2. TXINTEGERCODES directive

TXINTEGERCODES directive

Converts textual characters to and from their corresponding integer codes.

Options

CONVERTTO = string token Whether to convert from text characters to integer codes or integer codes to text characters (codes, text) ; default code
REPRESENT = string token How to treat code values 128-255 (extendedascii, utf8); default exte if CODES defines no characters that can be represented only in UTF-8, otherwise utf8

Parameters

TEXT = texts Text structures (each with a single line only)
CODES = variates or scalars Integer codes corresponding to the characters in each text

Description

Textual characters all have corresponding integer code values (see http://unicode.org/charts/). For example, the characters in the basic ASCII character set have codes running from 0 to 127. The letters a-z have codes 97-122, the capital letters have codes 65-90, and the digits 0-9 have codes 48-57. These characters can all be represented by a single “byte” of computer storage, consisting of eight “bits” each able to store either one or zero. Genstat stores other characters, such as those in the Chinese, Korean or Thai languages, in the UTF-8 format which uses up to four bytes per character.

By default, TXINTEGERCODES takes as input a text supplied by the TEXT parameter, which must contain only one line. The codes corresponding to the characters in the line are saved in a variate, supplied by the CODES parameter. Alternatively, if you set option CONVERTTO = text, the codes are taken as input, and TEXT saves the corresponding line of characters. Missing or zero codes are ignored, and invalid codes (for example, negative numbers) are faulted.

Codes 128-255 can be represented either by characters in the extended ASCII character set, or by 2-byte UTF-8 characters. These represent the same actual characters, but you may find one representation more convenient than the other, depending on how you want to use any output involving the text in future. If you have a preference, you can control this by setting the REPRESENT option. Otherwise, TXINTEGERCODES uses extended ASCII characters, unless the variate contains codes that can be represented only in UTF-8.

Options: CONVERTTO, REPRESENT.
Parameters: TEXT, CODES.

Action with RESTRICT

TXINTEGERCODES ignores any restrictions on the parameters.

See also

Directive: TEXT.
Commands for: Calculations and manipulation.

Example

" Example 1:4.7.8 "
TEXT    [VALUES='Ο Παρθενώνας'] Parthenon
&       [VALUES='Красный квадрат'] RedSquare
&       [VALUES='Château de Versailles'] Versailles
TXINTEGERCODES Parthenon,RedSquare,Versailles; CODES=Pcodes,Rcodes,Vcodes
PRINT   Pcodes,Rcodes,Vcodes; DECIMALS=0
VARIATE [VALUES=84,111,117,116,32,101,115,116,32,\
        116,101,114,109,105,110,233] Fcodes
TXINTEGERCODES [CONVERTTO=text; REPRESENT=utf8] Finished; CODES=Fcodes
PRINT   Finished
Updated on September 11, 2019

Was this article helpful?