I've been working on the locale support in MapTool for loading/saving data. I'm confusing myself a bit when it comes to the zip file output, so this is my stream-of-consciousness rambling about what data I have and what I'm trying to produce. Maybe it'll help me clarify what I need to do, or maybe it'll help someone else in the future.
I have an InputStreamReader (which I treat as a Reader) which is connected to a text file (via FileInputStream) and the UTF-8 character set (passed to the InputStreamReader constructor). So any time I read from the Reader, the text will be interpreted as UTF-8 and life is good.
Now I want to copy that data stream to a java.util.zip.ZipEntry inside the campaign file. I do this by creating ZipOutputStream and connecting it to a FileOutputStream. But how should I copy the data?
If I read() from the Reader I can store data into a char[]. But I can only write a byte[] to the OutputStream. What is the proper way to convert characters into bytes? Ideally I could just grab the bottom 8 bits, but that would screw up Unicode characters stored in the char[]. So should I be using a UTF-8 encoder as the data is written to the OutputStream? That seems counter-intuitive since I'm trying to write "raw" data. Although the rule is that characters must be decoded when reading and encoded when writing; but should that apply to the ZipOutputStream as well?
Now that I've typed all that out (!) I think I should be encoding on the way out again... I'll continue from there.
Converting characters to bytes...
Moderators: dorpond, trevor, Azhrei
Re: Converting characters to bytes...
Sorry if you've already figured this stuff out - I realize it's been a month since this post was written...
It sounds like you are looking for OutputStreamWriter to write characters and get them encoded properly on the wrapped OutputStream.
Some other random UTF-8 encoding things would be:
If you have a string you need to get UTF8 bytes for, you can use String.getBytes("UTF-8").
I see the MT project is using JakartaCommons - JakartaCommons Codec library has StringUtils.getBytesUtf8(String) which appears to be a convenience method for the previous link.
JakartaCommons IO has a FileWriterWithEncoding class, but that writes directly to a file, so you can't wrap your ZipOutputStream with it.
Hope that helps!
It sounds like you are looking for OutputStreamWriter to write characters and get them encoded properly on the wrapped OutputStream.
Some other random UTF-8 encoding things would be:
If you have a string you need to get UTF8 bytes for, you can use String.getBytes("UTF-8").
I see the MT project is using JakartaCommons - JakartaCommons Codec library has StringUtils.getBytesUtf8(String) which appears to be a convenience method for the previous link.
JakartaCommons IO has a FileWriterWithEncoding class, but that writes directly to a file, so you can't wrap your ZipOutputStream with it.
Hope that helps!
Re: Converting characters to bytes...
Yep, I am familiar with the idea between retrieving bytes in a particular encoding when starting from a String. The issue was more about storing data into a ZIP file using ZipOutputStream. When I'm writing text, it should be converted and when I'm writing binary data it shouldn't. But internally MT never considered the need for localization so the code used to read/write the ZIP files always used {Input|Output}Stream and never {Reader|Writer}.
In any case, that's all been fixed now and b74 will work properly.
I had hoped to here from another forum member about a patch of his, but it doesn't seem like it's coming, so b74 will either be out tonight or late Wednesday night (I've got a game tomorrow night so I'm busy ).
In any case, that's all been fixed now and b74 will work properly.
I had hoped to here from another forum member about a patch of his, but it doesn't seem like it's coming, so b74 will either be out tonight or late Wednesday night (I've got a game tomorrow night so I'm busy ).