February 01, 2011
Mark Wielaard: New GPG key.
Finally created a new GPG key using gnupg. The old one was a DSA/1024 bits one and 8 years old. The new one is a RSA/2048 bits one. I will use the new one in the future to sign any release tarballs I might create. pub 2048R/57816A6A 2011-01-29 Key f...
More »
February 01, 2011
Andrew Hughes: [SECURITY] IcedTea6 1.7.8, 1.8.5, 1.9.5 Released!.
We are pleased to announce a new set of security releases, IcedTea6 1.7.8, IcedTea6 1.8.5 and IcedTea6 1.9.5.
This update contains the following security updates:
The IcedTea project provides a harness to build the source code from OpenJDK6 u...
More »
December/2024
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 |
28 | 29 | 30 | 31 | | | | |
|
|
Writing CSV files as UTF-8 for Excel
Yesterday a coworker complained that Excel wasn't displaying a CSV (comma separated values) file correctly. Our application allows the user to send a report via email. The application provides the report as a CSV file. Because the report can contain multilingual text, we've decided to encode it in UTF-8. Unfortunately, when users click on the file to display it, usually in Excel, all of the multi-byte UTF-8 characters display incorrectly.
The problem was immediately clear to me...Excel was opening the UTF-8 encoded files, but it was incorrectly identifying them as Latin-1 encoded files. In the absence of any charset identification, Excel must guess about a file's content encoding. In our environment, many host PCs use en_US locales with Latin-1 as the typical charset. Excel uses that default to read and display CSV files.
My solution to the problem was to use the byte-order marker (BOM) to identify the CSV file as a Unicode file. I instructed my colleague to prepend the FEFF character to the file. The Java application that writes the file uses a FileWriter that encodes to UTF-8 to create the CSV file. It was simple to just output the BOM as the first character in the file.
Now when our customers double-click on these files, Excel opens the file, notices the BOM, and automatically selects UTF-8 as the file's charset encoding. Now Excel displays the previously mangled characters correctly. And I was able to hel...
Date: March, 24 2010
Url: http://www.java.net/blog/joconner/archive/2010/03/24/writing-csv-files-utf-8-excel
Others News
|