Converts textual characters to and from their corresponding integer codes.
Options
CONVERTTO = string token |
Whether to convert from text characters to integer codes or integer codes to text characters (codes , text ) ; default code |
---|---|
REPRESENT = string token |
How to treat code values 128-255 (extendedascii , utf8 ); default exte if CODES defines no characters that can be represented only in UTF-8, otherwise utf8 |
Parameters
TEXT = texts |
Text structures (each with a single line only) |
---|---|
CODES = variates or scalars |
Integer codes corresponding to the characters in each text |
Description
Textual characters all have corresponding integer code values (see http://unicode.org/charts/). For example, the characters in the basic ASCII character set have codes running from 0 to 127. The letters a-z have codes 97-122, the capital letters have codes 65-90, and the digits 0-9 have codes 48-57. These characters can all be represented by a single “byte” of computer storage, consisting of eight “bits” each able to store either one or zero. Genstat stores other characters, such as those in the Chinese, Korean or Thai languages, in the UTF-8 format which uses up to four bytes per character.
By default, TXINTEGERCODES
takes as input a text supplied by the TEXT
parameter, which must contain only one line. The codes corresponding to the characters in the line are saved in a variate, supplied by the CODES
parameter. Alternatively, if you set option CONVERTTO = text
, the codes are taken as input, and TEXT
saves the corresponding line of characters. Missing or zero codes are ignored, and invalid codes (for example, negative numbers) are faulted.
Codes 128-255 can be represented either by characters in the extended ASCII character set, or by 2-byte UTF-8 characters. These represent the same actual characters, but you may find one representation more convenient than the other, depending on how you want to use any output involving the text in future. If you have a preference, you can control this by setting the REPRESENT
option. Otherwise, TXINTEGERCODES
uses extended ASCII characters, unless the variate contains codes that can be represented only in UTF-8.
Options: CONVERTTO
, REPRESENT
.
Parameters: TEXT
, CODES
.
Action with RESTRICT
TXINTEGERCODES
ignores any restrictions on the parameters.
See also
Directive: TEXT
.
Commands for: Calculations and manipulation.
Example
" Example 1:4.7.8 " TEXT [VALUES='Ο Παρθενώνας'] Parthenon & [VALUES='Красный квадрат'] RedSquare & [VALUES='Château de Versailles'] Versailles TXINTEGERCODES Parthenon,RedSquare,Versailles; CODES=Pcodes,Rcodes,Vcodes PRINT Pcodes,Rcodes,Vcodes; DECIMALS=0 VARIATE [VALUES=84,111,117,116,32,101,115,116,32,\ 116,101,114,109,105,110,233] Fcodes TXINTEGERCODES [CONVERTTO=text; REPRESENT=utf8] Finished; CODES=Fcodes PRINT Finished