$ZPATNumeric

$ZPATN[UMERIC] is a read-only intrinsic special variable that determines how GT.M interprets the patcode "N" used in the pattern match operator.

With $ZPATNUMERIC="UTF-8", the patcode "N" matches any numeric character as defined by UTF-8 encoding. With $ZPATNUMERIC="M", GT.M restricts the patcode "N" to match only ASCII digits 0-9 (that is, ASCII 48-57). When a process starts in UTF-8 mode, intrinsic special variable $ZPATNUMERIC takes its value from the environment variable gtm_patnumeric. GT.M initializes the intrinsic special variable $ZPATNUMERIC to "UTF-8" if the environment variable gtm_patnumeric is defined to "UTF-8". If the environment variable gtm_patnumeric is not defined or set to a value other than "UTF-8", GT.M initializes $ZPATNUMERIC to "M".

GT.M populates $ZPATNUMERIC at process initialization from the environment variable gtm_patnumeric and does not allow the process to change the value.

For characters in Unicode, GT.M assigns patcodes based on the default classification of the Unicode character set by the ICU library with three adjustments:

  1. If $ZPATNUMERIC is not "UTF-8", non-ASCII decimal digits are classified as A.

  2. Non-decimal numerics (Nl and No) are classified as A.

  3. The remaining characters (those not classified by ICU functions: u_isalpha, u_isdigit, u_ispunct, u_iscntrl, 1), or 2) above) are classified into either patcode P or C. The ICU function u_isprint is used since is returns "TRUE" for non-control characters.

The following table contains the resulting Unicode general category to M patcode mapping:

Unicode General Category

GT.M patcode Class

L* (all letters)

A

M* (all marks)

P

Nd (decimal numbers)

N (if decimal digit is ASCII or $ZPATNUMERIC is "UTF-8", otherwise A

Nl (letter numbers)

A (examples of Nl are Roman numerals)

No (other numbers)

A (examples of No are fractions)

P* (all punctuation)

P

S* (all symbols)

P

Zs (spaces)

P

Zl (line separators)

C

Zp (paragraph separators)

C

C* (all control code points)

C

For a description of the Unicode general categories, refer to http://unicode.org/charts/.

Example:

GTM>write $zpatnumeric
UTF-8
GTM>Write $Char($$FUNC^%HD("D67"))?.N ; This is the Malayalam decimal digit 1                            
1
GTM>Write 1+$Char($$FUNC^%HD("D67"))
1
GTM>Write 1+$Char($$FUNC^%HD("31")) ; This is the ASCII digit 1
2