Page 1 of 2

vlc 1.1.0 subtitles auto mode regression

Posted: 10 Jun 2010 00:20
by temp4746
In vlc 1.0.5, vlc seems to correctly auto detect an hebrew windows-1255 encoded .srt and show it correctly.
In vlc 1.1.0-rc1, vlc seems to incorrectly auto detect an hebrew windows-1255 encoded .srt file and displays gibberish, I have to manually set the subtitles encoding in the settings dialog to make it work.

EDIT: This happens with 1.1.0 final to.

OS: WIndows 7 Ultimate 32-bit.

Re: vlc 1.1.0-rc1 subtitles auto mode regression

Posted: 10 Jun 2010 17:28
by VLC_help
Have you tried RC2 ?

Re: vlc 1.1.0-rc1 subtitles auto mode regression

Posted: 10 Jun 2010 22:51
by temp4746
I downloaded from here: http://www.videolan.org/vlc/releases/1.1.0-RC.html
The link from the news post in the main site seems to indicate that this is RC1.
Guess trying RC2 will require compiling it myself which is quite a nasty thing to do under Windows. :-|

Re: vlc 1.1.0-rc1 subtitles auto mode regression

Posted: 11 Jun 2010 23:38
by VLC_help

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 22 Jun 2010 15:47
by temp4746
I tested this issue with 1.1.0 final, and I'm seeing exactly the same behaviour.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 22 Jun 2010 17:45
by Lotesdelere
Reset Preferences and Cache.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 22 Jun 2010 17:48
by temp4746
Reset Preferences and Cache.
I already did that :-|

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 22 Jun 2010 20:53
by Damien
The same here.

viewtopic.php?f=34&t=76048&start=15#p251748

People (who don't know what to do)will use another player,that simple. :-|

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 22 Jun 2010 23:45
by temp4746
Sad that it did work correctly for me in 1.0.5
Someone should really look at the Auto encoding detection code...

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 23 Jun 2010 17:11
by secarica
Happens the same here (i.e. autodetect is broken). System is set to Romanian, subtitles should display in CP-1250 Central European, but display wrong in CP-1252 (Western). Windows Vista 64bit.

With 1.0.5 works ok, even with a not-installed 1.0.5 version (just unzipped).

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 23 Jun 2010 18:17
by Jean-Baptiste Kempf
If someone could do a proper bug report, that would be amazing...

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 23 Jun 2010 20:09
by VLC_help
I can if someone provides me sample subtitle file =)

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 23 Jun 2010 20:43
by secarica
I can if someone provides me sample subtitle file =)
A sample may not be enough, in that a particular sample should match a particular codepage setting at system level.
For example, take this one. It is 8 bit, CP-1250.

With all auto, row #2 in VLC 1.0.5 displays "Traducerea şi adaptarea *** etc." (which is correct).
Same row in VLC 1.1.0 displays "Traducerea ºi adaptarea *** etc." (which is wrong).

Row #4 in VLC 1.0.5 displays "Te simţi bine, dragule ?" (which is correct).
Same row in VLC 1.1.0 displays "Te simþi bine, dragule ?" (which is wrong).

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 24 Jun 2010 21:04
by VLC_help

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 19:46
by Rémi Denis-Courmont
In vlc 1.0.5, vlc seems to correctly auto detect an hebrew windows-1255 encoded .srt and show it correctly.
In vlc 1.1.0-rc1, vlc seems to incorrectly auto detect an hebrew windows-1255 encoded .srt file and displays gibberish, I have to manually set the subtitles encoding in the settings dialog to make it work.
The hebrew translation does not define a default encoding currently. Looking at the changes, it has not been maintained for several years. We can add CP1255. But if the translation is totally outdated anyway, you might prefer to use English and set the subtitle encoding manually :-| .

I will fix it manually in VLC 1.1.1 but there is only so much the developers can do without active translators.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 19:56
by Rémi Denis-Courmont
Sad that it did work correctly for me in 1.0.5
Someone should really look at the Auto encoding detection code...
The "code" just tries to decode the subtitle as UTF-8 (unless you've disabled UTF-8 autodetection) then falls back to a locale-defined character encoding. If you use VLC in English, then the default is CP1252. Microsoft uses that as character encoding for English and other western European languages.

The autodetection logic is basically the same since VLC 0.8.5. In earlier versions, the code would fall back to the local system character encoding. This used to work mostly well in the last century. But most systems have switched to Unicode by default nowadays, so that trick would not work anymore.

Fron VLC 0.8.5 through 1.0.6, the default values were hard-coded in the VLC source code. It turned out to be a bad idea as the number of supported languages exploded. VLC has almost 70 translations nowadays. From VLC 1.1.0 onward, the default character encodings are specified in the message translation files. Unfortunately, some VLC translation are currently unmaintained (Hebrew is one example). There you go...

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 20:08
by temp4746
Sad that it did work correctly for me in 1.0.5
Someone should really look at the Auto encoding detection code...
The "code" just tries to decode the subtitle as UTF-8 (unless you've disabled UTF-8 autodetection) then falls back to a locale-defined character encoding. If you use VLC in English, then the default is CP1252. Microsoft uses that as character encoding for English and other western European languages.

The autodetection logic is basically the same since VLC 0.8.5. In earlier versions, the code would fall back to the local system character encoding. This used to work mostly well in the last century. But most systems have switched to Unicode by default nowadays, so that trick would not work anymore.

Fron VLC 0.8.5 through 1.0.6, the default values were hard-coded in the VLC source code. It turned out to be a bad idea as the number of supported languages exploded. VLC has almost 70 translations nowadays. From VLC 1.1.0 onward, the default character encodings are specified in the message translation files. Unfortunately, some VLC translation are currently unmaintained (Hebrew is one example), and still many bugs are not reported in due time (during test and release candidate cycles). There you go...
There is one thing strange though...

I used VLC 1.0.5 set to English and subtitle encoding on auto
And the exact same srt was correctly displayed with the proper encoding.

With the exact same circumstances in VLC 1.1.0, I get gibberish.

It's still logical to use the system defined encoding, as even though many systems are unicode they still have an encoding defined for use for non-unicode programs and files.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 20:21
by secarica
The "code" just tries to decode the subtitle as UTF-8 (unless you've disabled UTF-8 autodetection) then falls back to a locale-defined character encoding. If you use VLC in English, then the default is CP1252. Microsoft uses that as character encoding for English and other western European languages. [...] From VLC 1.1.0 onward, the default character encodings are specified in the message translation files. Unfortunately, some VLC translation are currently unmaintained (Hebrew is one example).
This is probably the expected behaviour, but unfortunately it is contradicted by reality.

I use my system fully in Romanian (locale & UI language), VLC interface is set to auto and displays correctly in Romanian language, gettext is translated with msgctxt "GetACP" / msgid "CP1252" -> msgstr "CP1250". Auto for subtitles does not work, simple as that.

VLC 1.0.5 displays characters in CP1250 (correct).
VLC 1.1.0 displays characters in CP1252 (wrong).

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 20:30
by Rémi Denis-Courmont
It's still logical to use the system defined encoding, as even though many systems are unicode they still have an encoding defined for use for non-unicode programs and files.
So you would check for UTF-8 and then fallback to, err, UTF-8 which is the default character encoding on most operating systems. The whole point of the 0.8.5 change was to solve this idiocy. It makes much more sense to default to a legacy character set for the user language.

If you're using VLC in English and watching non-Unicode subs in another language, you're calling for trouble. There will always be a failure scenario where the user plays a sub in a different character set than what the auto mode expects, no matter what the logic is. That's why we have the manual settings. And we even user-firendly categories for the choices these days.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 20:46
by secarica
So you would check for UTF-8 and then fallback to, err, UTF-8 which is the default character encoding on most operating systems.
All Windows newer than NT 4.0 have an 8 bit setting that matches the choosed locale (the so-called Language for non-Unicode programs). A program that knows it uses 8 bit text file should check that setting in the first place. Usually when a user reads wrong character encoding in text subtitles, that place is the first to be checked (and changed if necessary). This is an approach at operating system level, not at application level.

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 21:33
by Rémi Denis-Courmont
All Windows newer than NT 4.0 have an 8 bit setting that matches the choosed locale (the so-called Language for non-Unicode programs).
If you configure VLC 1.1.1 to use the same language as your system, then you will get a code page that matches.

No matter what we do there will always be a problem if you configure VLC in one language on a system in another language. By definition, there are two conflicting choices here. You can't expect VLC to fix it for you in 100% cases. VLC selects audio and subtitle tracks in the configured language (when possible) so it should, and does, follow the same practice for the default character encoding.

Besides, that is the non-Windows-specific policy; as an open-source developer, I am not going to write code that can only work correctly on a retarded proprietary expensive operating system. You've decided to use the only OS that still think we are in the eighties as far as character sets are concerned; you deal with it. The current code is doing you a favor in the most likely case that Windows, VLC and subtitles all use the same language.
Usually when a user reads wrong character encoding in text subtitles, that place is the first to be checked (and changed if necessary). This is an approach at operating system level, not at application level.
That's not true. First, only knowledgeable users would ever know of this, and most Windows users aren't knowledgeable. Second, the most logical place would be the subtitle area in the open dialog, if only it had a widget to select the encoding.

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 26 Jun 2010 22:36
by secarica
If you configure VLC 1.1.1 to use the same language as your system, then you will get a code page that matches.
Have you read my message above ? Subtitles encoding in VLC 1.1.0 does not work properly in all auto mode.
No matter what we do there will always be a problem if you configure VLC in one language on a system in another language.
Not the case on my system. Cannot speak about other's configuration.
You've decided to use the only OS that still think we are in the eighties as far as character sets are concerned; you deal with it.
I have nothing to decide, almost 100% subtitles I download for my language are in 8 bit encoding. If I would use a Mac or Linux to view that subtitles, the program there must know how to handle my 8 bit encoded subtitle files.

Things are more complex or obscure, for example one of the program used for translating and creating subtitles is Subtitles Translator, which cannot handle Unicode at all. Same for several other subtitles-specific programs (on Windows). The OS has nothing to do here, except perhaps that it still allow 8 bit-only applications to run.
Usually when a user reads wrong character encoding in text subtitles, that place is the first to be checked (and changed if necessary). This is an approach at operating system level, not at application level.
That's not true. First, only knowledgeable users would ever know of this, and most Windows users aren't knowledgeable. Second, the most logical place would be the subtitle area in the open dialog, if only it had a widget to select the encoding.
Generally true (I suppose), but not completely true here in my country (Romania), where questions and prompt answers on this matter are common on our large forums.

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 27 Jun 2010 00:11
by Jean-Baptiste Kempf
The real question is: what has changed between 1.0.5 and 1.1.0 and when was the first regression?

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 27 Jun 2010 00:51
by secarica
The real question is: what has changed between 1.0.5 and 1.1.0 and when was the first regression?
VLC 1.0.5 does not have this portion of gettext:

Code: Select all

#. xgettext: #. The Windows ANSI code page most commonly used for this language. #. VLC uses this as a guess of the subtitle files character set #. (if UTF-8 and UTF-16 autodetection fails). #. Western European languages normally use "CP1252", which is a #. Microsoft-variant of ISO 8859-1. That suits the Latin alphabet. #. Other scripts use other code pages. #. #. This MUST be a valid iconv character set. If unsure, please refer #. the VideoLAN translators mailing list. #: modules/codec/subtitles/subsdec.c:296 msgctxt "GetACP" msgid "CP1252" msgstr "CP1250"
Perhaps for some reason VLC consider the msgid part instead the msgstr part ?

Cristi

Re: vlc 1.1.0 subtitles auto mode regression

Posted: 27 Jun 2010 02:54
by Rémi Denis-Courmont
If you configure VLC 1.1.1 to use the same language as your system, then you will get a code page that matches.
Have you read my message above ? Subtitles encoding in VLC 1.1.0 does not work properly in all auto mode.
Emphasis modified. I wonder who should blame the other one for not reading.