• Do not register here on develop.twiki.org, login with your twiki.org account.
• Use View topic Item7848 for generic doc work for TWiki-6.1.1. Use View topic Item7851 for doc work on extensions that are not part of a release. More... Close
• Anything you create or change in standard webs (Main, TWiki, Sandbox etc) will be automatically reverted on every SVN update.
Does this site look broken?. Use the LitterTray web for test cases.

Item5314: TinyMCE breaks Chinese, Japanese, Korean Unicode Encoding upon entering "Edit Twiki Markup" from WYSIWYG

Item Form Data

AppliesTo: Component: Priority: CurrentState: WaitingFor: TargetRelease ReleasedIn
Extension TinyMCEPlugin Normal Closed   patch 4.2.1

Edit Form Data

Summary:
Reported By:
Codebase:
Applies To:
Component:
Priority:
Current State:
Waiting For:
Target Release:
Released In:
 

Detail

(ThYang - I've updated the summary of your bug to reflect a better description of a specific bug developers should be able to repro)

Broken Chinese problem.

This one may relate to Bugs.Item5248

You can input Chinese and save it successfully (left & middle picture). But if you re-edit again, this words will turn into some other codes ... (right picture)

chinese-utf8.jpg

My setting in LocalSite.cfg.

$TWiki::cfg{Site}{Locale} = 'zh_TW.UTF-8';
$TWiki::cfg{Site}{CharSet} = 'UTF-8';
$TWiki::cfg{Site}{Lang} = 'zh';
$TWiki::cfg{Site}{FullLang} = 'zh-tw';

Related issues

-- TWiki:Main/ThYang - 02 Feb 2008

I have seen this problem too. It exists for CJK - Chinese, Japanese, Korean

Installation: Twiki 4.2.0

Steps to Reproduce: 1) Install Default TWiki 4.2.0 installation

2) Edit Page in WYSIWYG

3) Add CJK string, example below: 中國字 (this sez, Chinese Characters!) 비 (This is a korean character)

(I'm using Firefox, you might need East Asian font support for you to see this on Windows XP)

4) Save page -> do this step for easy reproduction.

5) Re-edit page in WYSIWYG. Chinese characters are still there PROPERLY ENCODED.

6) Hit in WYSIWYG "Edit TWiki Markup".

Result: chinese characters are destroyed to single byte encoding, and shows up in twiki markup editor as effective gibberish.

Expected Result: -> In Twiki markup, UTF8 encoded chinese characters preserved.

Workaround: -> Never Ever Ever EVER hit "TWiki Markup Editor" in WYSIWYG. -> Use Raw HTML editor.

-- TWiki:Main.TimothyChen - 15 Feb 2008

Tim,

The patch in Bugs.Item4946 solved the problem.

-- TWiki:Main.ThYang - 04 Mar 2008

According to the report in Item4946 the last patch still has open issues.

Can someone Chinese educate me how you enter chinese characters? One of the major reasons why I cannot attack this one is that I have no clue how to write Chinese on a Danish or English keyboard. Do you type a percent u and 4 characters?

Can you supply someone like me with some simple ways to enter Chinese words including a picture of it so I can compare that it remains correct?

What we really need here is a Chinese language programmer to give a hand. That would be the best

-- TWiki:Main.KennethLavrsen - 06 Mar 2008

See also Item5457 for the same problems in Cyrilic, so a Russian language programmer would do just as well. Whichever it is, we desperately need a programmer who uses these character sets on a daily basis (or an expert on UTF-9 like TWiki:Main.RichardDonkin) to help resolve this!

Note that the more I think about it, the more i think the open issue against the patch in Item4946 is a red herring. The problem only occurs if invalid UTF-8 is fed to it.

CC

I don't read/write Chinese, so the way I tested Chinese text and other languages was to simply find some Chinese text on the web and copy/paste it (as Unicode which is default on Windows) into a TWiki edit form. The browser should convert it from any source character set (GBK and GB2312 are common for Chinese, as well as UTF-8) into the target character set based on whatever encoding is used for the TWiki page (subject to what TMCE does of course). Sites such as Yahoo China should also be a good source of Chinese text. You could also copy/paste from the HTML rendered text above e.g. 中國字.

Any corruption is very likely to just wreck the Chinese text visibly rather than subtly corrupt it into another Chinese character, so in practice it's quite easy to check whether something is broken.

Do check TWiki:Codev.JapaneseAndChineseSupport for some restrictions on character sets that will work on server side - basically only EUC variants and UTF-8 will work.

-- TWiki:Main.RichardDonkin - 27 Mar 2008

Another thought: if the issue is with invalid UTF-8, there's already a handy regex in the TWiki code that checks for this - used in the TWiki:Codev.EncodeURLsWithUTF8 code. So it should not be too hard to re-use this regex whether on server side or TMCE, if that's the problem. Having said that, it's quite hard to get invalid UTF-8 i.e. something that conforms to the basic UTF-8 encoding approach yet is either overlong or using illegal codepoints.

-- RichardDonkin - 28 Mar 2008

Actually, it's not all that hard. The specific case used in testing Item4946 occurred when someone fed text containing %ACTION into the decoder. %AC turned into an illegal character. If the encoding is handled correctly, it should never happen.

-- TWiki:Main.CrawfordCurrie - 28 Mar 2008

I believe I finally found the solution. I ended up having to convert octets to UTF-8 to stop the HTML::Parser falling over, then converting UTF-8 wide chars to HTML entities to stop the print falling over (STDOUT is not opened :utf8). I was able to engineer the fix without needing to touch the core code.

Many thanks to TWiki:Main/ThYang, TWiki:Main.TimothyChen, TWiki:Main/OlegButovich and TWiki:Main.RichardDonkin for exploring around the problem and proposing investigative procedures.

-- CrawfordCurrie - 31 Mar 2008

Almost, but not quite; TWiki:Main.KwangErnLieuw discovered that entering chinese chars in pickaxe mode didn't work, so had to fix that too.

-- CrawfordCurrie - 31 Mar 2008

ItemTemplate
Summary TinyMCE breaks Chinese, Japanese, Korean Unicode Encoding upon entering "Edit Twiki Markup" from WYSIWYG
ReportedBy TWiki:Main.ThYang
Codebase 4.2.0
SVN Range TWiki-5.0.0, Wed, 23 Jan 2008, build 16283
AppliesTo Extension
Component TinyMCEPlugin
Priority Normal
CurrentState Closed
WaitingFor

Checkins TWikirev:16420 TWikirev:16421 TWikirev:16598 TWikirev:16599 TWikirev:16600 TWikirev:16601
TargetRelease patch
ReleasedIn 4.2.1
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View |  Raw edit | More topic actions
Topic revision: r20 - 2008-08-04 - KennethLavrsen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback