July 13, 2009

This software provides a solution to the Perl 5 Unicode bug. It allows users to work around the bug and continue using Perl 5 without experiencing any Unicode related issues.

Version 1.02

License Perl Artistic License

Platform Linux

Supported Languages English

Homepage search.cpan.org

Developed by Juerd Waalboer

Unicode::Semantics is a Perl module that works around the Unicode bug in Perl 5. The module addresses an issue wherein the internal encoding of a string is hidden from programmers but affects the semantics of the string. When the internal encoding is ISO-8859-1, Perl uses ASCII semantics, and when UTF8 is used, it uses Unicode semantics. This module resolves the unpredictability associated with the mixed use of encoding with the help of Unicode::Semantics::us() function.

The Unicode::Semantics::us() function provides predictable results for strings. Normally, if the internal encoding of a string is ISO-8859-1, its non-ASCII part is ignored for string operations. For instance, certain operations like uc, lc, ucfirst, lcfirst, U, L, u, l, d, s, w, D, S, W, /.../i, (?i:...) and /[[:posix:]]/ ignore the non-ASCII part of the character set. However, by leveraging us, you can upgrade your string to UTF-8 internally and get a enhanced string. It's also worth noting that the module exports an alias called up by default.

While releasing the module initially, the developer had used us, but later preferred the up. You can also use the built-in function utf8::upgrade to upgrade your string and retrieve the number of octets used for the internal UTF8 buffer. If you upgrade a non-string variable, like numbers, references, objects, and undef, it is stringified at upgrade. The us, up, and utf8::upgrade do mutate the actual value of the variable. If you only need to upgrade a copy of a string, then make the copy first.

Upgrading an already upgraded variable does not re-upgrade, so it is safe to do so. One can use the SYNOPSIS section to understand how the up function can be leveraged. You can force Unicode semantics on your string by using up $foo. Alternatively, you could use the up function with a regular expression substitution, up($foo) =~ s/W/_/g, which upgrades and uses the string immediately.

Overall, Unicode::Semantics offers a solution to manage the internal encoding of strings effectively and get predictable results.

What's New

Version 1.02: N/A

Free Download 5.3K

Softpile

Free Downloads

Unicode::Semantics

Most Popular

Related Downloads