This behavior has been introduced by
TWikirev:16003
. The code that has been introduced and now causes
TWiki perl-5.8.8 to segfault is this:
# Item4946: support chars in %u format
$text =~ s/%u([\da-f]{4})/chr(hex($1))/gei;
# chr($unicode_codepoint) works w/o a pragma in Perl 5.8 and 5.6
unless( $TWiki::cfg{Site}{CharSet} =~ /^utf-?8$/i ) {
my $t = UTF82SiteCharSet( $text );
$text = $t if ( $t );
}
Commenting it out cures the problem again. Don't know if other perl versions than 5.8.8 behave different.
The input that triggers the segfault is an utf8 string with an %[a-z] sequence in it, like in %ACTION{}%. This branch of the code is not executed otherwise.
I am not sure why these lines have been introduced, nor why perl segfaults.
--
TWiki:Main/MichaelDaum
- 18 Dec 2007
I reverted the above patch for now and dropped the prio again. Please try to come up with a better fix for
Item4946.
--
TWiki:Main/MichaelDaum
- 18 Dec 2007
Further investigation of this bug shows, that url-decoding a string which is
not url-encoded will produce random segfaults in perl, depending on the size of the string you feed to it.
--
TWiki:Main.MichaelDaum
- 18 Dec 2007
Michael, can you please put the raw text of the page causing segfault so that others can reproduce the problem?
--
TWiki:Main.HideyoImazu
- 20 Dec 2007
I do have a string that causes a segfault, a logfile of an irc conversation in a verbatim section and some more TWiki variables in it. Unfortunately, I can't give it as is to you as this irc conversation was confidential. Anytime I tried to change this string in content or length, the behavior changed: I now have a version in a test case where perl coredumps on sporadically. This is very bad news. I emailed this test case to a partner of mine (covered under the same conditions of confidentiality) and he is able to reproduce the segfaults.
What I found out is, that a plugin was calling
TWiki::urlDecode()
on a string that was
not url-encoded. This string contained TWiki variables, e.g.
%ACTION{}%
and such, which then where converted due to the
%[a-z]
sequence, and may happen to insert some non-ascii chars into the perl string.
That's what perl doesn't like and is producing segfaults from now to then.
Bottom line: it is
dangerous to call
TWiki::urlDecode()
(and a subsequent
UTF82SiteCharSet()
) on a string which is not url-encoded and I am afraid we can't do much about it.
I hope to be able to reproduce the same segfaults on a neutral string.
--
TWiki:Main.MichaelDaum
- 20 Dec 2007
Most like the following line is causing segfault.
$text =~ s/%u([\da-f]{4})/chr(hex($1))/gei;
%uXXXX (X is a hexadecimal digit) matches the pattern.
What I'd like to know is the exact %uXXXX you have in your topic.
--
TWiki:Main.HideyoImazu
- 26 Dec 2007
It was an
%ACTION{}%
, that caused the segfault. The rest of the topic had an influence on segfaulting as well. Random changes to non-utf text resulted in different behaviours.
--
TWiki:Main.MichaelDaum
- 27 Dec 2007
Some observations.
Someone has written a plugin that calls an internal non API function in TWiki. He should be spanked for that. It is a 5 line function. Why not write a local function that works and ensures the plugin also works in 4.2.1 and 5.0 etc etc?
How can calling an internal non API function be an urgent bug?
Unless the TWiki::urlDecode has a potential problem that can cause segfault also in normal internal use.
--
TWiki:Main.KennethLavrsen
- 08 Jan 2008
It is not the question
how to call
TWiki::urlDecode()
so that perl segfaults. The issue is
that calling
TWiki::urlDecode()
with certain parameters
can segfault perl. I simply don't feel safe with this code in the core.
--
TWiki:Main.MichaelDaum
- 08 Jan 2008
I just did a global search for urlDecode and the only plugin that calls this function is
WysiwygPlugin.
So it is not a silly call to internal API that causes this so it is fair to keep urgent at least until we understand the nature of the issue. We have to rely on you Michael to do that because the error vector you use to reproduce it is a secret.
--
TWiki:Main.KennethLavrsen
- 08 Jan 2008
There is a testcase to show the segfault. Did you try it?
--
TWiki:Main.MichaelDaum
- 08 Jan 2008
Reference?
--
TWiki:Main.KennethLavrsen
- 08 Jan 2008
It's a unit testcase. One of the
UTF8Tests.
--
TWiki:Main.CrawfordCurrie
- 11 Jan 2008
I reveted the fix which closes this one.
The seg faults were serious. Also in real life and not just unit test cases.
--
TWiki:Main.KennethLavrsen
- 16 Jan 2008