• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5144: TWiki segfaults on utf8 strings

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Engine   Urgent Closed   minor 4.2.0

Edit Form Data

Reported By:
Applies To:
Current State:
Waiting For:
Target Release:
Released In:


This behavior has been introduced by TWikirev:16003. The code that has been introduced and now causes TWiki perl-5.8.8 to segfault is this:
   # Item4946: support chars in %u format 
    $text =~ s/%u([\da-f]{4})/chr(hex($1))/gei; 
    # chr($unicode_codepoint) works w/o a pragma in Perl 5.8 and 5.6 
    unless( $TWiki::cfg{Site}{CharSet} =~ /^utf-?8$/i ) { 
       my $t = UTF82SiteCharSet( $text ); 
       $text = $t if ( $t ); 

Commenting it out cures the problem again. Don't know if other perl versions than 5.8.8 behave different.

The input that triggers the segfault is an utf8 string with an %[a-z] sequence in it, like in %ACTION{}%. This branch of the code is not executed otherwise.

I am not sure why these lines have been introduced, nor why perl segfaults.

-- TWiki:Main/MichaelDaum - 18 Dec 2007

I reverted the above patch for now and dropped the prio again. Please try to come up with a better fix for Item4946.

-- TWiki:Main/MichaelDaum - 18 Dec 2007

Further investigation of this bug shows, that url-decoding a string which is not url-encoded will produce random segfaults in perl, depending on the size of the string you feed to it.

-- TWiki:Main.MichaelDaum - 18 Dec 2007

Michael, can you please put the raw text of the page causing segfault so that others can reproduce the problem?

-- TWiki:Main.HideyoImazu - 20 Dec 2007

I do have a string that causes a segfault, a logfile of an irc conversation in a verbatim section and some more TWiki variables in it. Unfortunately, I can't give it as is to you as this irc conversation was confidential. Anytime I tried to change this string in content or length, the behavior changed: I now have a version in a test case where perl coredumps on sporadically. This is very bad news. I emailed this test case to a partner of mine (covered under the same conditions of confidentiality) and he is able to reproduce the segfaults.

What I found out is, that a plugin was calling TWiki::urlDecode() on a string that was not url-encoded. This string contained TWiki variables, e.g. %ACTION{}% and such, which then where converted due to the %[a-z] sequence, and may happen to insert some non-ascii chars into the perl string. That's what perl doesn't like and is producing segfaults from now to then.

Bottom line: it is dangerous to call TWiki::urlDecode() (and a subsequent UTF82SiteCharSet() ) on a string which is not url-encoded and I am afraid we can't do much about it.

I hope to be able to reproduce the same segfaults on a neutral string.

-- TWiki:Main.MichaelDaum - 20 Dec 2007

Most like the following line is causing segfault.

$text =~ s/%u([\da-f]{4})/chr(hex($1))/gei;

%uXXXX (X is a hexadecimal digit) matches the pattern. What I'd like to know is the exact %uXXXX you have in your topic.

-- TWiki:Main.HideyoImazu - 26 Dec 2007

It was an %ACTION{}%, that caused the segfault. The rest of the topic had an influence on segfaulting as well. Random changes to non-utf text resulted in different behaviours.

-- TWiki:Main.MichaelDaum - 27 Dec 2007

Some observations.

Someone has written a plugin that calls an internal non API function in TWiki. He should be spanked for that. It is a 5 line function. Why not write a local function that works and ensures the plugin also works in 4.2.1 and 5.0 etc etc?

How can calling an internal non API function be an urgent bug? Unless the TWiki::urlDecode has a potential problem that can cause segfault also in normal internal use.

-- TWiki:Main.KennethLavrsen - 08 Jan 2008

It is not the question how to call TWiki::urlDecode() so that perl segfaults. The issue is that calling TWiki::urlDecode() with certain parameters can segfault perl. I simply don't feel safe with this code in the core.

-- TWiki:Main.MichaelDaum - 08 Jan 2008

I just did a global search for urlDecode and the only plugin that calls this function is WysiwygPlugin.

So it is not a silly call to internal API that causes this so it is fair to keep urgent at least until we understand the nature of the issue. We have to rely on you Michael to do that because the error vector you use to reproduce it is a secret.

-- TWiki:Main.KennethLavrsen - 08 Jan 2008

There is a testcase to show the segfault. Did you try it?

-- TWiki:Main.MichaelDaum - 08 Jan 2008


-- TWiki:Main.KennethLavrsen - 08 Jan 2008

It's a unit testcase. One of the UTF8Tests.

-- TWiki:Main.CrawfordCurrie - 11 Jan 2008

I reveted the fix which closes this one.

The seg faults were serious. Also in real life and not just unit test cases.

-- TWiki:Main.KennethLavrsen - 16 Jan 2008

Summary TWiki segfaults on utf8 strings
ReportedBy TWiki:Main.MichaelDaum

SVN Range TWiki-4.3.0, Sat, 15 Dec 2007, build 16003
AppliesTo Engine

Priority Urgent
CurrentState Closed

Checkins TWikirev:16029 TWikirev:16030 TWikirev:16121
TargetRelease minor
ReleasedIn 4.2.0
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r17 - 2008-01-16 - KennethLavrsen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback