%EDITFORMFIELD{"FIELD" topic="TOPIC"}%
garbles some Unicode characters.
This is caused by CGI::* functions for form input fields used under the hood.
Those functions call HTML::Entities::encode_entities(), which converts an "unsafe" character into an entity representation (
&something;
).
This may happen with any multi-byte character encoding, but let's take UTF-8 as an example.
To this date, TWiki treats characters without setting the UTF-8 flag.
There is a character which is encoded to
e3 82 8b
in UTF-8.
When EDITFORMFIELD handles such a character, CGI::* functions convert the
8b
byte into
‹
partly because the UTF-8 flag is not set, which means that three byte sequence is not recognized as a single character but three characters under the hood.
When calling CGI::*, a Unicode string needs to be handed with the UTF-8 flag turned on.
--
TWiki:Main/HideyoImazu
- 2016-07-08