• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.
In lib/TWiki/Render.pm, when using multibyte string like UTF-8 or EUC-JP and its length is over 32 bytes, html A tag isn't closed correctly and it causes broken layout. Patch below fixes this and it works fine for me.

--- Render.pm.orig      2006-02-08 00:08:45.000000000 +0900
+++ Render.pm   2006-03-10 14:27:13.312500000 +0900
@@ -395,7 +395,9 @@
     if ( !$compatibilityMode ) {
         $anchorName =~ s/^[\s\#\_]*//;  # no leading space nor '#', '_'
     }
-    $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+    $anchorName =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;
+    $anchorName =~ tr/ /+/;
+    $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars
     if ( !$compatibilityMode ) {
         $anchorName =~ s/[\s\_]*$//;    # no trailing space, nor '_'
     }

Thanks Fujii-san, that inspired me to develop a full fix. Though I don't understand why you tr spaces to + signs. Note that my fix is as follows:

sub truncateString {
    my( $str, $lim ) = @_;
    if( $TWiki::cfg{UseLocale} ) {
        # Convert whole string to unicode to avoid
        # truncating halfway through a character.
        $str = pack( 'U0C*', unpack( 'C*', $str ));
    }
    $str = substr($str, 0, $lim) if length($str) > $lim;
    return $str;
}
This method has to be used wherever a string is truncated.

SVN 9194, 9195

CC

Sorry, this doesn't work. I get internal errors in i.e. the Jump box when entering a non-existent web ("FFF." for instance).

-- SP


Thank you for quick reply. TWiki community is really active, isn't it?

>>Though I don't understand why you tr spaces to + signs.

I think I have read something about that, the white space must be replaced as '+' sign, and '+' sign itself should be encoded as %2B because '+' sign represents a delimiter of multiple arguments for CGI. This is based on RFC2396, I think. But I can't find a document describing about that, so this might be my misunderstanding.

I assumed a combination of Japanese and US-ASCII strings, including white spaces. In that case, such as "JJJJJSuSE LinuxJJJJJ" while J is Japanese character, I thought that the white space should be encoded. But in this case, I don't know which case is suitable, %20 or '+' in TWiki context. Maybe %20 is OK.

TWiki:Main.KatsuhikoFujii


Actually, neither in that context. URL and HTML encodings are handled in urlEncode and entityEncode methods respectively, so aren't need here.

CC

I still get UTF-8 character failures sometimes .. another simple one, reproducible here at develop with the latest update is:

Local log message is (blacklistplugin enabled, but same error with it disabled):

[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution iterator at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1112.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution iterator at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1114.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution iterator at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1127.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in pattern match (m//) at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1416.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in split at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1423.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in pattern match (m//) at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1424.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in pattern match (m//) at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1424.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution iterator at /home/httpd/twiki/<domain>/lib/TWiki/Render.pm line 1510.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution (s///) at /home/httpd/twiki/<domain>/lib/TWiki/Plugins/BlackListPlugin.pm line 246.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution (s///) at /home/httpd/twiki/<domain>/lib/TWiki/Plugins/BlackListPlugin.pm line 246.
[Mon Mar 13 20:54:54 2006] view: Malformed UTF-8 character (unexpected non-continuation byte 0x67, immediately after start byte 0xf8) in substitution iterator at /home/httpd/twiki/<domain>/lib/TWiki/Plugins/BlackListPlugin.pm line 246.
********************************
Malformed UTF-8 character (fatal) at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322.
 at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322
   TWiki::UI::View::_prepare('%INCLUDE{"%TWIKIWEB%.WebSearch"}%', 'TWiki=HASH(0x81518a8)', 'Main', 'WebSearch', 'TWiki::Meta=HASH(0x8da5dec)', 0) called at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 304
   TWiki::UI::View::view('TWiki=HASH(0x81518a8)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 97
   TWiki::UI::__ANON__() called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 387
   eval {...} called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 379
   Error::subs::try('CODE(0x8d885b0)', 'HASH(0x8d65f48)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 146
   TWiki::UI::run('CODE(0x819f8a8)') called at /home/httpd/twiki/<domain>/bin/view line 31

********************************
Malformed UTF-8 character (fatal) at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322.
 at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322
   TWiki::UI::View::_prepare('%INCLUDE{"%TWIKIWEB%.WebSearch"}%', 'TWiki=HASH(0x81518a8)', 'Main', 'WebSearch', 'TWiki::Meta=HASH(0x8da5dec)', 0) called at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 304
   TWiki::UI::View::view('TWiki=HASH(0x81518a8)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 97
   TWiki::UI::__ANON__() called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 387
   eval {...} called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 379
   Error::subs::try('CODE(0x8d885b0)', 'HASH(0x8d65f48)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 146
   TWiki::UI::run('CODE(0x819f8a8)') called 

- Did I mention that on my installation this error is only triggered when using IE to browse the link, firefox displays the search alright?

-- SP

Seems there is some performance loss connected with this Item (at least in my setup).

-- SP

I'm pretty sure the error on ~develop is due to the perl version. I can't reproduce it on an identical setup, where the only difference is the perl version (I run 5.8.4, develop runs 5.6). As such the fix is "upgrade your perl version".

CC

Sounds fair, I'm on 5.8.8 now, didn't see it there (yet :-)).

Are you able to reproduce the performance issue? It's a ~30% setback in my installation - can I set an option to not make use of this patch?

-- SP

Had this show up on 5.8.8, direct to non-existant topic, ../Sandbox/MeetingMinutes.

Looks reproducible - maybe MAKETEXT-related?

Again, IE only.

********************************
Malformed UTF-8 character (fatal) at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322.
 at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 322
   TWiki::UI::View::_prepare('---++ %MAKETEXT{"NOTE: This Wiki topic does not exist yet"}%\x{a}...', 'TWiki=HASH(0xb2e5dcc)', 'Sandbox', 'MeetingMinutes', 'TWiki::Meta=HASH(0xb3ae2d0)', 0) called at /home/httpd/twiki/<domain>/lib/TWiki/UI/View.pm line 304
   TWiki::UI::View::view('TWiki=HASH(0xb2e5dcc)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 97
   TWiki::UI::__ANON__() called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 387
   eval {...} called at /home/httpd/twiki/<domain>/lib/CPAN/lib///Error.pm line 379
   Error::subs::try('CODE(0xb3ae888)', 'HASH(0xb3bdc90)') called at /home/httpd/twiki/<domain>/lib/TWiki/UI.pm line 146
   TWiki::UI::run('CODE(0x8082680)') called 

-- SP

As the original author of that FIXME comment, I have to say that using Unicode is the right way to go ultimately - however, full Unicode support is a lot more than just this patch, and the fix shown here won't work in sites not using UTF-8 as site character set anyway.

The fundamental issue here is that we have just tried to add TWiki:Codev.UnicodeSupport in an incomplete way. Any time you are getting 'Malformed UTF-8 characters' in Perl, it's using Unicode (internal UTF-8) characters. This may well be clear to everyone here, but the reason I'm highlighting this is that adding Unicode support is not trivial and can't be done with a simple bugfix. It requires significant effort to ensure that Unicode character mode is used for all inputs and output, including CGI parameters, mod_perl, SpeedyCGI, file reads and writes, etc. If you just try and do a quick fix it won't work, and will in fact cause problems.

In fact, until we have a plan and some committed developers to add Unicode support, we should actively be treating any use of Perl internal Unicode characters as a bug - some packages such as CPAN:CGI have created such characters, which has caused bugs during the Dakar development process.

Once we do get Unicode support working, there will be quite a lot of effort to get decent performance (something like a 2-3 times performance hit in my limited tests) and stomp or work around Perl bugs (of which there were quite a few in 5.8.3, though 5.8.8 may be better).

Here are some suggestions for a simple fix:

  • Always append a space character after truncating heading to 32 ASCII bytes - this ensures that if we do break a two-byte character, it at least doesn't cause the HTML '<' immediately afterwards to get gobbled up as part of a two-byte character. This is simple and low overhead.
  • Stay with current fix approach, but convert back to the site character set afterwards (e.g. EUC-JP) - currently you would end up with mix of UTF-8 and EUC-JP if that's the site character set. If site character set is UTF-8, convert from characters (internal encoding) to bytes (string of bytes that just happen to equate to UTF-8, i.e. 2 to 4 Perl characters for one Unicode character). This would be higher overhead as you'd need to convert every such heading's character set using something like CPAN:Encode (similar code already exists in TWiki). However, it would avoid possibility of garbling last character of headings that use multi-byte encodings such as EUC-JP and UTF-8.

Just noticed this issue - would be really helpful if people could flag any Unicode/I18N bugs in Codev somehow, perhaps via a comment in TWiki:Codev.InternationalisationIssues. Also, I will try to keep track of this issue via RSS, but I don't really have any significant time to contribute on this - all I can do is carp from the sidelines and encourage dedicating enough resource to tame the Perl Unicode beast smile

On an unrelated topic - how long does it take to sync my password from TWiki.org? Changed it yesterday afternoon, would have thought it was synced by now.

-- RD

Reverting this in favor of speed and 4.0.1 compatibility. If re-opened, please at least see that Item1913 is satisfied with new solution.

Setting this Normal / n/a.

SVN 9388 (TWiki4). Develop is left untouched, experiments can continue there.

-- SP

Thanks. I just tried this link posted above that shows a similar bug (possibly unrelated though) - I'm using Firefox not IE, and generally I think this bug is browser-independent. Since 'develop.twiki.org' runs Perl 5.6, that might be down to the old 'use utf8' issue in CPAN:CGI, which breaks 5.6.

-- RD

Anchors are now US-ASCII only, closing this.

-- TWiki:Main.SteffenPoulsen - 17 Sep 2007

ItemTemplate
Summary Rendering bug of headings when using MBCS as a heading string
ReportedBy TWiki:Main.KatsuhikoFujii
Codebase

SVN Range Fri, 03 Mar 2006 build 9056
AppliesTo Engine
Component I18N
Priority Normal
CurrentState Closed
WaitingFor

Checkins 9192 9193 9194 9195 9388
TargetRelease minor
ReleasedIn 4.2.0
Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r19 - 2008-01-22 - KennethLavrsen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback