White space preservation in fixed font fragments (done by
) may clobber printable characters depending on the site charset.
The bug was spotted on UTF-8 character set with words ending by cyrillic small letter ha (х,
). The sequence of octets
D1 85 20
is changed to
D1 26 6E 62 73 70 3B 20
, i.e. "\xD1 \x20".
- 15 May 2007
I'm surprised this is happening at all given that UTF-8 is ASCII-safe. Note that you shouldn't be using UTF-8 with TWiki unless you are using a non-alphabetic character set (e.g. those for Chinese or Japanese), as Unicode is not supported for general use with TWIki. Use KOI8-R, it works much better than UTF-8 - see TWiki:Codev.CyrillicSupport
for Russian usage.
Having looked at the code, it might be because CGI.pm is doing something odd. Q for developers: why are we using CPAN:CGI
for things like applying bold markup? I know it's convenient but it seems like a performance overhead compared to simply putting in the relevant HTML with an
For more details on why not to use Unicode mostly, see the documentation at TWiki:Codev.InstallationWithI18N
- 22 Jun 2007
The bug is still present in TWiki release 4.3.0.
And there is a bug of the same origin in heading rendering (
$text =~ s/^\s*(.*?)\s*$/$1/
cripples the text of section headings ending with cyrillic small letter ha encoded in UTF-8.
I work around both problems by calling
- 05 May 2009