Page 1 of 1

Quadro performance significantly worse than integrated

Posted: 05 Dec 2018 01:58
by LuckyT
I am working on a video intensive application and have chosen to use libvlc for playback. Technically I'm interacting with libvlc through the Vlc.DotNet wrapper, but I don't think my issues are wrapper related. The app can have as many as 8 1080 videos up, and the content being played is 30fps. A user can both play the videos, or use a slider or mousewheel to skip around the videos. The videos are relatively small and so I am able to bring them completely into memory before passing them to vlc. Everything is synced up, so all 8 play or seek at once.

This is all works surprisingly smoothly on my laptop when running on the Intel 530 integrated chip. Playback is not perfectly smooth but completely acceptable for my use and seeking is basically real time. With defaults, playback generally shows maybe 50% load and fast seeking it can hit 80% or so. I don't fully understand the skips during playback with 50% GPU load, but as I said it definitely performs well enough.

Things are massively different when testing with a discrete GPU, both of which are Quadros in my case. On my laptop, I force the app to use the Quadro and performance is at least an order of magnitude worse. It will go to 100% usage right away for any video operation. Playback is terrible and seeking is basically unusable. I was hoping it was just an issue with my laptop, but testing on a workstation with an Xeon E5, 40GB, Quadro M2000 yielded basically identical results as the laptop.

I've fiddled with some of the various command line arguments for performance and actually seen noticeable improvements in the GPU usage numbers, but the end result in the videos is still very similar to passing no args at all.

I'm on the latest version from the official nuget package, 3.0.4, video drivers up to date, windows up to date, etc.

Is there some obvious setting I'm missing that would cause this?

Re: Quadro performance significantly worse than integrated

Posted: 05 Dec 2018 04:35
by mfkl
Can you try with the official VLC desktop app and share the logs?

Re: Quadro performance significantly worse than integrated

Posted: 05 Dec 2018 08:41
by mfkl
Also, share your Vlc.DotNet versions (all packages) and say whether you're using WPF or WinForms

Re: Quadro performance significantly worse than integrated

Posted: 06 Dec 2018 17:28
by LuckyT
At the moment Core, Core.Interops, and Forms (WinForms app) are all on 3.0.0-develop322. I will get some logs today.

Re: Quadro performance significantly worse than integrated

Posted: 10 Dec 2018 15:22
by LuckyT
Logs are pretty quiet, at least at verbosity 1. Reset all preferences to default before testing

Task manager GPU usage during playback of single video
Quadro M1000M - 16%
Intel HD 530 = 10%

Quadro logs -

Code: Select all

mp4 warning: unknown box type cTIM (incompletely loaded) mp4 warning: unknown box type cTSC (incompletely loaded) mp4 warning: unknown box type cTSZ (incompletely loaded) mp4 warning: Unknown uuid type box mp4 warning: elst box found mp4 warning: STTS table of 1 entries mp4 warning: CTTS table of 393 entries mp4 warning: elst box found mp4 warning: STTS table of 1 entries faad warning: decoded zero sample d3d11va warning: not enough decoding slices in the texture (6/24) avcodec info: Using D3D11VA (NVIDIA Quadro M1000M, vendor 4318(NVIDIA), device 5041, revision 162) for hardware decoding direct3d11 error: SetThumbNailClip failed: 0x800706f4
Intel logs -

Code: Select all

mp4 warning: unknown box type cTIM (incompletely loaded) mp4 warning: unknown box type cTSC (incompletely loaded) mp4 warning: unknown box type cTSZ (incompletely loaded) mp4 warning: Unknown uuid type box mp4 warning: elst box found mp4 warning: STTS table of 1 entries mp4 warning: CTTS table of 393 entries mp4 warning: elst box found mp4 warning: STTS table of 1 entries faad warning: decoded zero sample avcodec info: Using D3D11VA (Intel(R) HD Graphics 530, vendor 32902(Intel), device 6427, revision 6) for hardware decoding main warning: picture is too late to be displayed (missing 154 ms) main warning: picture is too late to be displayed (missing 120 ms) main warning: picture is too late to be displayed (missing 87 ms) main warning: picture is too late to be displayed (missing 54 ms) main warning: picture is too late to be displayed (missing 20 ms) direct3d11 error: SetThumbNailClip failed: 0x800706f4

Re: Quadro performance significantly worse than integrated

Posted: 10 Dec 2018 17:42
by RĂ©mi Denis-Courmont
I would actually expect better performance with iGPU than dGPU as far as video decoding is concerned. The dGPU video DSP is not necessarily much better than the one in the iGPU. And the iGPU will not incur the penalty of memory transfer.

Re: Quadro performance significantly worse than integrated

Posted: 10 Dec 2018 19:54
by LuckyT
I would actually expect better performance with iGPU than dGPU as far as video decoding is concerned. The dGPU video DSP is not necessarily much better than the one in the iGPU. And the iGPU will not incur the penalty of memory transfer.

I was thinking the same regarding the memory penalty. My mistake is assuming a discrete GPU would be much better at decode...if that isn't the case then probably nothing unusual going on here.