Help - Corruption of Non-ASCII Characters

From dis-Emi-A

Jump to: navigation, search

I'm having a problem with my installation regrading the use of accented characters (actually it appears with any non-ascii character). I suspect perhaps a DB setup error, though I cannot determine what it is, or how I would fix it.

If I use a non-ascii character on a page it displays fine, so long as it is not followed by further text (that is, if followed by a blank or newline). As soon as the character is followed by another character it becomes corrupted.

The page I am trying is: http://wiki.disemia.com/Sandbox It has the corrupted data in it, the pretext and posttext both should be a ü.

I took the same source (before corruption) and tried at the mediawiki sandbox and had no problems (likely eliminating browser error).

Using wget I got the contents of that page and see that the first ü's are encoded correctly for UTF-8 (C3 BC), the corrupted ü is encoded as (EF BF BD).

According to RELEASE-NOTES I am using MediaWiki 1.11.0

MySQL: Server version: 5.0.45 FreeBSD port: mysql-server-5.0.45

PHP 5.2.4 with Suhosin-Patch 0.9.6.2 (cli) (built: Oct 4 2007 18:22:07)

Progress

  • I checked the MySQL character sets and collation and they are all set to utf-8. The PHP files in MediaWiki, where relevant, are also marked as UTF-8.
  • I did some manual changes to the DB "text" table, by writing the correct characters directly in the table. Doing this the page renders correctly on loading, thus I assume it must be a saving error (MediaWiki writing incorrectly to the DB).
  • refer to http://wiki.disemia.com/index.php?title=Sandbox&oldid=1949
  • SOLVED: Traced to func_overload setting in a global .htaccess file. MediaWiki works only with 0.
Personal tools