I've tried to phrase this question simply, but the topic can get a little complicated so if you don't deal with translations very much you probably want to just avoid this thread.
(Before you comment on the following paragraphs: Yes, I know it's really called "UTF-8" but I'm typing "UTF8" because it's easier and I think it looks better. )
The file is UTF8 encoded, but it uses \u2028 as the end of line character. Here's a hex dump of the file contents:
Code: Select all
0000000: efbb bf23 4765 726d 616e 2076 6572 7369 ...#German versi
0000010: 6f6e 2030 2e36 3620 2830 372e 3035 2e32 on 0.66 (07.05.2
0000020: 3031 3029 2062 7920 6150 6f77 6e2c 2070 010) by aPown, p
0000030: 726f 6f66 2d72 6561 6420 6279 2049 6d70 roof-read by Imp
0000040: 2ee2 80a8 2357 656e 6e20 6574 7761 7320 ....#Wenn etwas
0000050: 5369 6e6e 6765 6dc3 a4c3 9f20 6661 6c73 Sinngem.... fals
0000060: 6368 20c3 bc62 6572 7365 747a 7420 7775 ch ..bersetzt wu
I need some way to convert this to Java's \u#### format. If I run Java's native2ascii -encoding UTF-8 on it, it's converted to \u2028 which is the Unicode end of line, but Java apparently only accepts \u000a or \u000d (i.e. the ASCII LF and CR characters) as the end of line.
My plan right now is to use native2ascii and convert the file, then do a global search and replace and change \u2028 into LF. But I was hoping for a cleaner solution that I could incorporate into the MapTool build process and automate.
For example, if I take a translation file and run it through native2ascii and it comes out the same, then no conversion is needed. If there is a difference, then I can save the output as the new translation file. I'm looking for something along those lines.
Side question: I've opened the file in both OSX's TextEdit and Vim and I can't get either one to write the file correctly! If I tell TextEdit to write it as ASCII (or anything other than UTF8) I get an error that it can't convert. And if I try to paste the text into Vim it seems to ignore the line endings completely.
I'm going to use my search-and-replace technique for right now because I want to put out RC5 tonight, but I'd like any input on this that folks might have.