Performance problems with simultaneous transcoding for livehttp
Posted: 05 Jun 2018 13:06
I will eventually have a live video feed coming in that I'd like to publish via HLS in a few different bitrates. During initial development, I've been trying to figure out transcoding using VLC just working from a single local file, under the assumption that I can easily replace the input later. (I say this here for context, in case transcode-from-file is somehow fundamentally different from transcode-from-stream.)
I've gotten as far as this:
It seems to work well, all three sets of files are being generated at the same time, but performance is pretty bad. The output files are being written at around half or less the real-time playback speed, with 4-second segments showing up every 10+ seconds. That would be fine if the hardware just couldn't keep up, but in Task Manager, my CPU usage hovers around 30% and GPU around 50%. As you can see in the script, I'm telling each transcode to use 4 threads, and I've forced VLC to use my GeForce 1060 instead of the Intel iGPU (though according to Task Manager it doesn't make much difference, in terms of percentage load).
Is there something I'm missing here? I'd like to use "all the hardware" if possible. I'm trying to get a feel for how much I can transcode in real-time on a given set of hardware, but if half the cores are always sitting idle, it seems like that could be put to better use.
I've gotten as far as this:
Code: Select all
"c:\program files\videolan\vlc\vlc-gpu.exe" -I dummy "c:\users\James\Downloads\bbb_sunflower_2160p_60fps_normal.mp4" --sout=#duplicate{dst={transcode{height=540,fps=15,vcodec=h264,vb=800,venc=x264{aud,profile=baseline,keyint=30},acodec=aac,ab=96,threads=4}:std{access=livehttp{seglen=4,delsegs=false,numsegs=0,index=c:\workspace\vidtest\bbb540.m3u8,index-url=bbb540-#####.ts},mux=ts{use-key-frames},dst=c:\workspace\vidtest\bbb540-#####.ts}},dst={transcode{height=720,fps=30,vcodec=h264,vb=2200,venc=x264{aud,profile=baseline,keyint=60},acodec=aac,ab=128,threads=4}:std{access=livehttp{seglen=4,delsegs=false,numsegs=0,index=c:\workspace\vidtest\bbb720.m3u8,index-url=bbb720-#####.ts},mux=ts{use-key-frames},dst=c:\workspace\vidtest\bbb720-#####.ts}},dst={transcode{height=1080,fps=60,vcodec=h264,vb=3200,venc=x264{aud,profile=baseline,keyint=120},acodec=aac,ab=192,threads=4}:std{access=livehttp{seglen=4,delsegs=false,numsegs=0,index=c:\workspace\vidtest\bbb1080.m3u8,index-url=bbb1080-#####.ts},mux=ts{use-key-frames},dst=c:\workspace\vidtest\bbb1080-#####.ts}}}
Is there something I'm missing here? I'd like to use "all the hardware" if possible. I'm trying to get a feel for how much I can transcode in real-time on a given set of hardware, but if half the cores are always sitting idle, it seems like that could be put to better use.