A PHP mbstring bug?
As you might already know, the PHP mbstring extension supports a feature called Function Overloading. If turned on, this feature overloads multi-byte string functions on the respective standard string functions. That means that whenever you call a standard string function (like strlen()), it’s multi-byte counterpart (mb_strlen()) is called automatically behind the scenes. This is indeed a very useful feature for UTF-8 sites, if it works properly.
Yesterday I was doing a face-lift on my mother’s site. As I was installing WordPress, a weird thing happened when I installed the Slovenian translation. After enabling the translation in the config file, the site stopped working completely. It just hung there for 30 seconds, then it printed out the following errors:
Warning: unpack() [function.unpack]: Type V: not enough input, need 4, have 0 in /****/wp-includes/gettext.php on line 85
Warning: unpack() [function.unpack]: Type V: not enough input, need 4, have 0 in /****/wp-includes/gettext.php on line 85
Fatal error: Maximum execution time of 30 seconds exceeded in /****/wp-includes/streams.php on line 60
I then spent several hours trying to figure out the source of this problem. After some heavy googling, I found this PHP bug in their bug-tracking system. By sheer chance, the version of PHP installed on our production server was indeed 5.2.1, and since I was not completely sure whether the server was 32bit or 64bit, I assumed that this was the culprit. I immediately upgraded PHP to the latest stable version, which at the time was 5.2.5. To my dismay, this solved nothing. The exact same error was still appearing whenever translation was enabled.
Since I knew that the translation file worked — I had tested it on another server — I dug further into the code. After a lot of debugging, I finally tracked the problem to the StringReader class in the wp-includes/streams.php file. This class is responsible for reading the translation file in chunks. For this purpose, it uses the PHP functions substr() and strlen(). Although the translation file is actually binary, this is quite ok, because these functions can handle binary data under normal circumstances. However, these were not normal circumstances. I was quite baffled to find out that the strlen() function in this class was returning the wrong size for many of the chunks, as well as for the entire file. The only logical explanation for this that I could think of, was that the functions were being overloaded by their multi-byte counterparts. Although, when I checked the mbstring.func_overload configuration option, it showed 0. Since I didn’t really believe that this was the case, I tried setting the option to 0 in the .htaccess file and with the ini_set() function. It didn’t help. The string functions still behaved as bizarrely as before.
Still being quite sure that the source of the problem was mbstring overloading, I searched the web some more. I finally found the solution — a workaround really — in the PHP manual in a comment under the entry for strlen(). I replaced all the strlen($string) calls with mb_strlen($string, ‘latin1’); and all the substr() calls in a similar manner. This actually solved the problem, albeit not very elegantly.
I do wonder though if it’s an actual bug in PHP or in the mbstring extension, that causes the overloading feature to be turned on and preventing it from being turned off manually. I had come across this problem before on our development server with an earlier version of PHP, and since this problem seems to be occurring in versions 5.2.1 and 5.2.5 too, there must be at least 3 different versions of PHP affected by it. I have since checked the PHP’s bug-tracking system and found no entry about it.
So, at the end of the day, I still don’t know the real cause of this problem for sure, but now at least I know of a workaround.
Update: I have now found two bug reports for this issue — #27421 and #39361 — it seems this bug has been there for a very long time, but has never been fixed.