SSA subtitle encoding issue
Posted: 10 Aug 2020 09:30
In a mkv file the subtitles lines including music notes (♪) are displaying as (♪)
I have tried setting "default encoding" language under settings to "default (windows-1252)" and all utf variations.
Using mkvextract --fullraw I see "C3 A2 E2 84 A2 C2 AA" which is being interpreted as utf and turned into ♪
But when I use and load subs.srt it works correctly (but not if I use -c utf8)
EDIT: Why does VLC's reading of this line not match mkvextract / iconv (see 2nd post)?
---
Debug info
The encoding from Styles is 0 which according to http://moodub.free.fr/video/ass-specs.doc means
"Encoding. This specifies the font character set or encoding and on multi-lingual Windows installations it provides access to characters used in multiple than one languages. It is usually 0 (zero) for English (Western, ANSI) Windows."
If I use
gedit shows the correct character (and vlc displays them correctly)
"Dialogue: 0,0:03:48.91,0:03:50.50,Default,,0,0,0,,♪ ♪"
If I use
ghex shows
"113,0,Default,,0,0,0,,......."
and bytes (hand transcribed so ~5% error rate)
"31 31 33 2C 30 2C 44 65 66 61 75 6C 74 2C 2C 30 2C 30 2C 30 2C 2C C3 A2 E2 84 A2 C2 AA 20 C3 A2 E2 84 A2 C2 AA"
the text portion is "C3 A2 E2 84 A2 C2 AA" which interpreted as utf[1] turns into ♪
[1]https://utf8-chartable.de/unicode-utf8- ... 28&names=-
https://utf8-chartable.de/unicode-utf8- ... 28&names=-
I have tried setting "default encoding" language under settings to "default (windows-1252)" and all utf variations.
Using mkvextract --fullraw I see "C3 A2 E2 84 A2 C2 AA" which is being interpreted as utf and turned into ♪
But when I use
Code: Select all
mkvextract <file> tracks -c windows-1252 2:subs.srt
EDIT: Why does VLC's reading of this line not match mkvextract / iconv (see 2nd post)?
---
Debug info
Code: Select all
vlc --version
VLC media player 3.0.8 Vetinari (revision 3.0.8-0-gf350b6b5a7)
Code: Select all
mkvextract <file> tracks -c windows-1252 2:subs.srt
[Script Info]
; Script generated by FFmpeg/Lavc58.55.100
ScriptType: v4.00+
PlayResX: 384
PlayResY: 288
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,16,&Hffffff,&Hffffff,&H0,&H0,0,0,0,0,100,100,0,0,1,1,0,2,10,10,10,0
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.24,0:00:03.91,Default,,0,0,0,,[bright tone]
...
Dialogue: 0,0:03:48.91,0:03:50.50,Default,,0,0,0,,♪ ♪
"Encoding. This specifies the font character set or encoding and on multi-lingual Windows installations it provides access to characters used in multiple than one languages. It is usually 0 (zero) for English (Western, ANSI) Windows."
If I use
Code: Select all
mkvextract <file> tracks -c windows-1252 2:subs.srt
"Dialogue: 0,0:03:48.91,0:03:50.50,Default,,0,0,0,,♪ ♪"
If I use
Code: Select all
mkvextract <file> tracks --fullraw 2:test.txt
"113,0,Default,,0,0,0,,......."
and bytes (hand transcribed so ~5% error rate)
"31 31 33 2C 30 2C 44 65 66 61 75 6C 74 2C 2C 30 2C 30 2C 30 2C 2C C3 A2 E2 84 A2 C2 AA 20 C3 A2 E2 84 A2 C2 AA"
the text portion is "C3 A2 E2 84 A2 C2 AA" which interpreted as utf[1] turns into ♪
[1]https://utf8-chartable.de/unicode-utf8- ... 28&names=-
https://utf8-chartable.de/unicode-utf8- ... 28&names=-
Code: Select all
mediainfo <file>`
Text
ID : 3
Format : ASS
Codec ID : S_TEXT/ASS
Codec ID/Info : Advanced Sub Station Alpha
Compression mode : Lossless
Writing library : Lavc58.55.100 ssa
Language : English
Default : Yes
Forced : No
Statistics Tags Issue : mkvmerge v24.0.0 ('Beyond The Pale') 64-bit 2019-08-17 01:03:24 / Lavf58.30.100
FromStats_BitRate : 103
FromStats_Duration : 00:27:52.958000000
FromStats_FrameCount : 796
FromStats_StreamSize : 21706