This software implements a Perl pragma that facilitates enabling and disabling of UTF-8 or UTF-EBCDIC in source code. It can be used to ensure that characters from these Unicode encoding standards are properly recognized and handled by Perl scripts.
To use utf8, one must first enable it using the use utf8 pragma. This tells the Perl parser to allow the use of UTF-8 within the program text, specifically within the current lexical scope. However, it should be noted that the no utf8 pragma can be used to switch back to treating the source text as literal bytes in the current lexical scope.
It is important to note that utf8 should not be used for anything other than telling Perl that a script is written in UTF-8. The utility functions described below are directly usable without the use utf8 pragma.
Since it is often difficult to reliably tell UTF-8 from native 8 bit encodings, it is recommended to either use a Byte Order Mark at the beginning of the source code or to use utf8 to instruct Perl accordingly. Additionally, when UTF-8 becomes the standard source format, this pragma will effectively become a no-op.
Enabling the utf8 pragma has several effects on source code. For example, bytes in the source text that have their high-bit set will be treated as part of a literal UTF-X sequence, including identifiers, string constants, and constant regular expression patterns. Furthermore, characters in the Latin 1 character set are treated as being part of a literal UTF-EBCDIC character on EBCDIC platforms.
It should be noted that if bytes with the eighth bit on are present in a script, utf8 may have difficulty since these bytes may not be well-formed UTF-X. To address this issue, one can disable the use utf8 pragma until the end of the block (or file, if at top level) by using no utf8.
Overall, utf8 is a useful tool for enabling or disabling UTF-8 and UTF-EBCDIC in Perl source code.
Version 1.14: N/A