February 01, 2011
Mark Wielaard: New GPG key.
Finally created a new GPG key using gnupg. The old one was a DSA/1024 bits one and 8 years old. The new one is a RSA/2048 bits one. I will use the new one in the future to sign any release tarballs I might create. pub 2048R/57816A6A 2011-01-29 Key f...
More »
February 01, 2011
Andrew Hughes: [SECURITY] IcedTea6 1.7.8, 1.8.5, 1.9.5 Released!.
We are pleased to announce a new set of security releases, IcedTea6 1.7.8, IcedTea6 1.8.5 and IcedTea6 1.9.5.
This update contains the following security updates:
The IcedTea project provides a harness to build the source code from OpenJDK6 u...
More »
November/2024
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
| | | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | | | | | | |
|
|
XML Processing with Scala
A few months ago, I had one of those unpleasant format conversion jobs. I had about 1,000 multiple choice questions in RTF format and needed to import them into Moodle.
RTF is, as file formats go, somewhere between the good and the evil. It looks like one should be able to write a parser for it, but that seems like a dreary task. The miracle of open source came through for me, though, in the rtf2xml project. Paul Tremblay authored a converter that faithfully converts RTF to XML, where you can process it with your usual XML tool chain. I just love it when someone else's labor saves me many hours of drudgery. Thanks, Paul—if we ever meet, I will gladly buy you a beer :-)
My first inclination was to use XSLT to transform the result into Moodle XML format. But I quickly realized that I would have gone insane in the process.
The XML was a festering mess, because it truthfully reflected the festering mess in the RTF files. The RTF files were, of course, produced from a Microsoft Word document. Apparently, few people know how to use Microsoft Word in an intelligent way, with character and paragraph styles. The authors of my files were no exception—they treated Word as a glorified IBM Selectric typewriter.
Monospace text was expressed in four different ways, spaces inside code were styled as Times New Roman, and sequences of code lines were never grouped into anything resembling a “preformatted” entity.
I rem...
Date: May, 16 2010
Url: http://www.java.net/blog/cayhorstmann/archive/2010/05/16/xml-processing-scala
Others News
|