Dealing with Special Chars in VLC Mozilla Plugin
Posted: 15 Oct 2010 21:43
Hi!
There is some inconsistency-problem in the vlc-mozilla-plugin with target names that contain special chars, e.g., öäü etc. However, this thread focuses on describing how one can use the vlc-mozilla-plugin to get it run with special chars in general, and under any platform (at least tested under Linux and Windows).
At first, just to point this out, it is not possible to embed an item by using special chars (upper ASCII, e.g. the German Umlaute öäüÖÄÜ) directly, so for example
will not work. This is regardless of the encoding of the embedding HTML-page (using the META-tag) and regardless of the underlying filesystem's character encoding (e.g. utf-8 on ext3 or iso-8859-1 on fat).
However, this is totally ok if one accepts that the target must be given as an URL-encoded string based on UTF-8-encoding. So we have to mask all special chars with the %xx%yy syntax where xx, yy is the two-byte-hexadecimal value of the special character. (It does not work to just use the common single-byte-encoding e.g. %FC instead of 'ü', one must use %C3%BC instead, so the encoding of the UTF-8-byte-representation of this special char. I suppose this is due to the --enable-utf8 compile flag).
Thus, the following will work:
...and it works also on both tested filesystems! (of course only if we use utf-8-encoding where requested, but it also works with iso-8859-1 encoded files named 'Frühling.mpg' on fat!)
Now, all could be fine - but there is some problem: If we rely on some urlencoder from some library, it will (according to the standards) translate any '+' into %2B and any blank space ' ' into +. So the file "New Film.mpg" will be translated to "New+Film.mpg". The VLC-Plugin is NOT able to decode this correctly back to 'New Film.mpg', it will simply keep all '+' characters as they are, and will fail to play 'New+Film.mpg' as it cannot be found. In contrast, it is ok to write 'Harry+Sally.mpg' or Harry%2BSally.mpg' to play a file named 'Harry+Sally.mpg'. This might be a bug, or just wanted - but it is not according to the standards.
Summa summarum: You must encode such target names as URL-encoded strings based on UTF-8-encoding, but leaving spaces blank!
You are on the safe side (incl. the '+'-chars) by applying the following two steps:
1.) URL-encoding of the complete targetname (incl. '+' and any other special chars)
2.) replace any + in the result (which all derive from spaces) back to spaces again.
Since the result will not contain any '+', but keeps originals of them safely encoded, this method will even work, if the vlc-mozilla-plugin at some later time changes the back-converting behaviour according to the standards to translating any '+' to blank spaces.
Here a final example with some Java-source-code (without some surrounding try+catch-block):
Hopefully, this avoids some headache for someone...
P.S.: For whom is interested, 'im Frühling' means 'in the springtime', so nothing offensive
There is some inconsistency-problem in the vlc-mozilla-plugin with target names that contain special chars, e.g., öäü etc. However, this thread focuses on describing how one can use the vlc-mozilla-plugin to get it run with special chars in general, and under any platform (at least tested under Linux and Windows).
At first, just to point this out, it is not possible to embed an item by using special chars (upper ASCII, e.g. the German Umlaute öäüÖÄÜ) directly, so for example
Code: Select all
<embed type="application/x-vlc-plugin" target="Frühling.mpg"/>
However, this is totally ok if one accepts that the target must be given as an URL-encoded string based on UTF-8-encoding. So we have to mask all special chars with the %xx%yy syntax where xx, yy is the two-byte-hexadecimal value of the special character. (It does not work to just use the common single-byte-encoding e.g. %FC instead of 'ü', one must use %C3%BC instead, so the encoding of the UTF-8-byte-representation of this special char. I suppose this is due to the --enable-utf8 compile flag).
Thus, the following will work:
Code: Select all
<embed type="application/x-vlc-plugin" target="Fr%C3%BChling.mpg"/>
Now, all could be fine - but there is some problem: If we rely on some urlencoder from some library, it will (according to the standards) translate any '+' into %2B and any blank space ' ' into +. So the file "New Film.mpg" will be translated to "New+Film.mpg". The VLC-Plugin is NOT able to decode this correctly back to 'New Film.mpg', it will simply keep all '+' characters as they are, and will fail to play 'New+Film.mpg' as it cannot be found. In contrast, it is ok to write 'Harry+Sally.mpg' or Harry%2BSally.mpg' to play a file named 'Harry+Sally.mpg'. This might be a bug, or just wanted - but it is not according to the standards.
Summa summarum: You must encode such target names as URL-encoded strings based on UTF-8-encoding, but leaving spaces blank!
You are on the safe side (incl. the '+'-chars) by applying the following two steps:
1.) URL-encoding of the complete targetname (incl. '+' and any other special chars)
2.) replace any + in the result (which all derive from spaces) back to spaces again.
Since the result will not contain any '+', but keeps originals of them safely encoded, this method will even work, if the vlc-mozilla-plugin at some later time changes the back-converting behaviour according to the standards to translating any '+' to blank spaces.
Here a final example with some Java-source-code (without some surrounding try+catch-block):
Code: Select all
String target = "Harry+Sally im Frühling.mpg"
target = URLEncoder.encode(target, "UTF-8"); // leads to "Harry%2BSally+im+Fr%C3%BChling.mpg"
target = target.replaceAll("\\+", " "); // "Harry%2BSally im Fr%C3%BChling.mpg"
browser.addHtmlLine("<embed type=\"application/x-vlc-plugin\" " + target + "/>"); // lets the vlc-mozilla-plugin correctly find the file, the two above versions would NOT work
P.S.: For whom is interested, 'im Frühling' means 'in the springtime', so nothing offensive