RPTools.net

Discussion and Support

Skip to content

It is currently Sun Sep 24, 2017 12:39 pm 






Reply to topic  [ 3 posts ] 

Previous topic | Next topic 

  Print view

Author Message
User avatar  Offline
Site Admin
 
Joined: Mon Jun 12, 2006 12:20 pm
Posts: 12102
Location: Tampa, FL
 Post subject: I need some file encoding help...
PostPosted: Sat Jan 22, 2011 12:14 am 
I have a German translation file (hello aPown!) that I can't seem to get the file encoding correct for Java.

I've tried to phrase this question simply, but the topic can get a little complicated so if you don't deal with translations very much you probably want to just avoid this thread. ;)

(Before you comment on the following paragraphs: Yes, I know it's really called "UTF-8" but I'm typing "UTF8" because it's easier and I think it looks better. ;))

The file is UTF8 encoded, but it uses \u2028 as the end of line character. Here's a hex dump of the file contents:

Code:
0000000: efbb bf23 4765 726d 616e 2076 6572 7369  ...#German versi
0000010: 6f6e 2030 2e36 3620 2830 372e 3035 2e32  on 0.66 (07.05.2
0000020: 3031 3029 2062 7920 6150 6f77 6e2c 2070  010) by aPown, p
0000030: 726f 6f66 2d72 6561 6420 6279 2049 6d70  roof-read by Imp
0000040: 2ee2 80a8 2357 656e 6e20 6574 7761 7320  ....#Wenn etwas
0000050: 5369 6e6e 6765 6dc3 a4c3 9f20 6661 6c73  Sinngem.... fals
0000060: 6368 20c3 bc62 6572 7365 747a 7420 7775  ch ..bersetzt wu

Every copy of the # character you see on the right denotes a comment in the translation file and it should appear in column zero of the file. The hex bytes immediately preceding that "#" are always e2 80 a8, which is UTF8-speak for "end of line".

I need some way to convert this to Java's \u#### format. If I run Java's native2ascii -encoding UTF-8 on it, it's converted to \u2028 which is the Unicode end of line, but Java apparently only accepts \u000a or \u000d (i.e. the ASCII LF and CR characters) as the end of line.

My plan right now is to use native2ascii and convert the file, then do a global search and replace and change \u2028 into LF. But I was hoping for a cleaner solution that I could incorporate into the MapTool build process and automate.

For example, if I take a translation file and run it through native2ascii and it comes out the same, then no conversion is needed. If there is a difference, then I can save the output as the new translation file. I'm looking for something along those lines.

Side question: I've opened the file in both OSX's TextEdit and Vim and I can't get either one to write the file correctly! If I tell TextEdit to write it as ASCII (or anything other than UTF8) I get an error that it can't convert. And if I try to paste the text into Vim it seems to ignore the line endings completely. :(

I'm going to use my search-and-replace technique for right now because I want to put out RC5 tonight, but I'd like any input on this that folks might have.


Top
 Profile  
 
 Offline
Great Wyrm
 
Joined: Sun Jun 22, 2008 6:53 pm
Posts: 2102
Location: Melbourne, Australia
 Post subject: Re: I need some file encoding help...
PostPosted: Sat Jan 22, 2011 12:38 am 
You should be able to edit the file with vim using
gvim "+set encoding=utf-8" filename
or mvim if using mac vim. Use the GUI version that way you don't need to make sure that your terminal
is set up correctly to display utf-8.

If this still doesn't work then you can open it as a binary (on a non Windows machine).
vim -b filename
Then
:%!xxd
The xxd command comes with most installations of vim
Edit the resulting hex (just a search and replace)
Then
:%!xxd -r
To turn in back in into binary and save


Top
 Profile  
 
User avatar  Offline
Site Admin
 
Joined: Mon Jun 12, 2006 12:20 pm
Posts: 12102
Location: Tampa, FL
 Post subject: Re: I need some file encoding help...
PostPosted: Sat Jan 22, 2011 2:33 am 
Craig wrote:
You should be able to edit the file with vim using
gvim "+set encoding=utf-8" filename
or mvim if using mac vim. Use the GUI version that way you don't need to make sure that your terminal
is set up correctly to display utf-8.

Using Mvim under the GUI, the encoding already shows up as UTF8. It appears that the "fileencoding" (shortcut "fenc") is used for actually converting the file when reading/writing, but it didn't seem to be doing anything when I tried it. Maybe a bug in vim? :?

Quote:
If this still doesn't work then you can open it as a binary (on a non Windows machine).
vim -b filename
Then
:%!xxd
The xxd command comes with most installations of vim
Edit the resulting hex (just a search and replace)
Then
:%!xxd -r
To turn in back in into binary and save

Good point. I wrote a one-line script to do the same thing, but I'd like to have a "platform solution" so that I can tell submitters, "do such-and-such and it'll be alright" if I get one that's gobbledy-gook. ;)

Code:
native2ascii i18n_xx.properties | perl -pe 's/\\u2028/\n/g;' > i18n.txt

And now i18n.txt has the right contents. That's just sort of an ugly kludge to add to my build though. 8)

Thanks for the ideas. I have a working translation file now and it's part of b82 (just released) so the immediate crush is off.


Top
 Profile  
 
Display posts from previous:  Sort by  
Reply to topic  [ 3 posts ] 

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:

Who is online

In total there are 2 users online :: 0 registered, 0 hidden and 2 guests (based on users active over the past 5 minutes)
Most users ever online was 243 on Sun Nov 04, 2012 6:14 am

Users browsing this forum: No registered users and 2 guests





Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group

Style based on Andreas08 by Andreas Viklund

Style by Elizabeth Shulman