Discussion:
My mp3 tagging code can't handle Unicode tags
Audio Ranger Development
2011-02-12 21:57:47 UTC
Permalink
Peter, Louis,

when using a programming language that supports Unicode strings (like
Java or .NET), you can simply determine whether any given string can be
encoded with ISO-8859-1 or not as follows (pseudo code):

bool canBeISOEncoded(String s)
{
for (int i = 0; i < s.length(); i++)
{
if (s.charAt(i) > 0xFF)
return (false);
}
return (true);
}

The simple trick is to just check if the Unicode string contains any
characters with a value of > U+FF (decimal 255). If yes, you need to
encode the String as Unicode. If not, the chars fit into ISO-8859-1.

If you need to go for Unicode, common encodings are UTF-8, UTF-16 or
UTF-32. Each encoding has advantages and disadvantages. Regarding ID3
tags, you should use UTF-16 and ID3 version 2.3 tags in order to
maximize compatibility with other implementations.

Hope this helps.

Kind regards,
Mathias Kunter
Developer of Magic MP3 Tagger
Here is my code that does this in java. Note that for ID3v2.3 only
encoding types 01 and 1 are valid("ISO-8859-1" and "UTF16"). the other
two values are valid for ID3V2.4, which is not widely supported, so I do
not recommend using them.
static final String[] ENC_TYPES = {"ISO-8859-1", "UTF16",
"UTF-16BE", "UTF-8"};
// Attempt to encode in encoding 0, if not possible use encoding 1
static public byte [] encodeString(String source, byte[] encodingB)
throws UnsupportedEncodingException {
byte [] result = source.getBytes(ENC_TYPES[encodingB[0]]);
if (encodingB[0] == 0) {
String checkResult = new String(result,ENC_TYPES[encodingB[0]]);
if (!source.equals(checkResult)) {
encodingB[0] = 1;
result = source.getBytes(ENC_TYPES[encodingB[0]]);
}
}
return result;
}
Loading...