Ticket #820 (closed defect: fixed)

Opened 16 months ago

Last modified 16 months ago

multibyte characters becomes garbled when imported via yaml fixture file

Reported by: malte Owned by: lsmith
Priority: major Milestone: 0.10.3
Component: Import/Export Version: 0.9.0
Severity: Keywords:
Cc: Has Test:
Status: Has Patch:

Description

If you put a few MB characters like "åäö£" (swedish characters and pound-sign) in a row in a table, and then dump the data to a fixture file, all is well, the characters are put cleanly in the yaml fixture file, and they occupy 2 bytes each as expected. But if you import this data back into the db, the MB characters will become garbled. "åäö£" becomes "åäö£". The latter string occupies 16 characters, since it now is 8 MB characters instead.

I have tested and verified the broken-ness of this on Linux (debian lenny) and Mac OSX (leopard).

I think this is a major problem with Doctrine, since it breaks your data in a large way if you are working with languages and symbols other than US english. Not even british english is safe (pound sign is broken).

DB used is MySQL, collation used is MySQLs default; latin1_swedish_ci. That collation can positively handle the characters above, so the error is not with the db or the OS (tested on multiple platforms). PHP version is 5.2.5.

Attachments

Parser.php.patch (439 bytes) - added by malte 16 months ago.
This patch fixes the problem

Change History

Changed 16 months ago by malte

This patch fixes the problem

Changed 16 months ago by malte

It seems to me that the problem is with php, not with Doctrine. It seems php does not load UTF8 encoded files properly by itself (when using the include() cmd as is done in Parser.php) but the condition can be fixed with the iconv() function.

Changed 16 months ago by jwage

  • status changed from new to closed
  • resolution set to fixed

(In [3929]) fixes #820

Note: See TracTickets for help on using tickets.