PHP and CSV files

If you have ever tried to use PHP to process CSV files that contain non-ASCII characters, you know there is a problem out there with lots of solutions that may or may not work. Mostly the latter from what I can tell. Herewith some comments from a PHP command line program that I use to populate a PostgreSQL table with data derived from a spreadsheet.

This is all command-line stuff under OS X.


* All CSV files that I have examined had invalid character codes for non-ASCII characters. They
* are not UTF anything, not ISO-8859-1, not anything that I can decipher. Example: the non-ASCII
* character eacute (é) is 0x00E9 in UTF-16, 0xC3A9 in UTF-8, or 0xE9 in ISO-8859-1.
*
* But the CSV files generated by Excel and/or Numbers use 0x8e for this character. Bummer.
*
* But examining an XLS spreadsheet in Excel or Numbers shows the proper characters before
* saving the file in CSV format, so the problem is not bad data coming in. What happens during the
* xls-to-csv conversion remains a mystery to me.
*
* Excel can export a spreadsheet in TXT format, which uses tabs instead of commas to separate
* the values. In this case, non-ASCII characters are generated correctly, but PHP cannot read
* the file correctly. (There might be a way to use the setlocale() function to get it to work,
* but I was unable to do so.)
*
* A dump of TXT files exported by Excel revealed that the file is in UTF-16LE format (on this
* Intel-based computer; mabye UTF-16BE on another type of processor). I used the command line,
* iconv -f UTF-16LE file.txt > file.tsv
* to generate the TSV file that is read by the program. I used the second argument to
* fgetcsv() to make the separators be tabs instead of commas.
*
* <hyperbole>After little more than a week of full-time effort</hyperbole>, it seems to work.