VLC3 Hardware Acceleration

This forum is about all development around libVLC.
Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

VLC3 Hardware Acceleration

Postby Remi73 » 29 Mar 2017 14:56

Hi all,

Today I'm updating my app from vlc 2.2 to vlc 3, most of things looks ok except in the libvlc_video_format_cb callback.
Usually I use this callback to get the chroma, and force it to BGRA if I do not support this chroma.

Since vlc3, on mac os, libvlc_video_format_cb is called a first time with CVPY chroma then vlc_player_cleanup_cb is called and finally libvlc_video_format_cb is call a second time with J420 chroma (which is the expected chroma). After these calls, libvlc_video_lock/unlock_cb begin.
With vlc2, I had libvlc_video_format_cb (J420) > lock/unlock.. > vlc_player_cleanup_cb
Has anything changed here ?
Moreover, I get some green frames when video playback begin, I don't know if it can be linked.

Thanks!

Edit:
On Windows, same kind of behavior with DXA9 > cleanup > DX11 > cleanup > J420 > lock/unlock.
No green frame.
Last edited by Remi73 on 30 Mar 2017 14:33, edited 1 time in total.

Rémi Denis-Courmont
Developer
Developer
Posts: 15267
Joined: 07 Jun 2004 16:01
VLC version: master
Operating System: Linux
Contact:

Re: About CVPixelBuffer opaque buffer type

Postby Rémi Denis-Courmont » 29 Mar 2017 17:35

Your callback implementation requests a format which requires conversion from the source. Attempt to build a conversion pipeline fails. So the decoder falls back to another source format.
Rémi Denis-Courmont
https://www.remlab.net/
Private messages soliciting support will be systematically discarded

Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

Re: About CVPixelBuffer opaque buffer type

Postby Remi73 » 30 Mar 2017 10:44

Ok so is there any way to know why the conversion pipeline building failed ?
I used to get NV12 chroma when using hardware acceleration, which worked pretty well.
From now it looks I don't have hardware acceleration anymore, is there a new way to activate or anything to provide to make it works again ?

Thank you

Cleaned log, if it can help

Code: Select all

avcodec decoder debug: available hardware decoder output format 61 (dxva2_vld) avcodec decoder debug: available hardware decoder output format 128 (d3d11va_vld) avcodec decoder debug: available software decoder output format 12 (yuvj420p) core generic debug: trying to reuse free vout core spu text debug: removing module "freetype" core input debug: Buffering 83% core input debug: Stream buffering done (375 ms in 1 ms) core spu text debug: looking for text renderer module matching "any": 2 candidates freetype spu text debug: Using DWrite backend freetype spu text debug: DWrite_GetFamily(): family name: Arial core spu text debug: using text renderer module "freetype" core filter debug: removing module "swscale" core vout display debug: Filter 000001f6ac928a88 removed from chain core vout display debug: removing module "vmem" core video output debug: Opening vout display wrapper core vout display debug: looking for vout display module matching "vmem": 12 candidates core vout display debug: VoutDisplayEvent 'resize' 1920x960 core vout display debug: using vout display module "vmem" core video output debug: original format sz 1920x960, of (0,0), vsz 1920x960, 4cc DXA9, sar 1:1, msk r0x0 g0x0 b0x0 core generic debug: reusing provided vout core vout display debug: VoutDisplayEvent 'window state' 0 core vout display debug: VoutDisplayEvent 'window state' 0 core generic debug: looking for hw decoder module matching "dxva2": 2 candidates directx_va generic debug: DLLs loaded directx_va generic debug: CreateDevice succeed dxva2 generic debug: OurDirect3DCreateDeviceManager9 Success! dxva2 generic: obtained IDirect3DDeviceManager9 dxva2 generic: DXVA2CreateVideoService Success! directx_va generic debug: - 'MPEG-2 variable-length decoder' is supported by hardware directx_va generic debug: - 'MPEG-2 inverse discrete cosine transform' is supported by hardware directx_va generic debug: - 'H.264 variable-length decoder, no film grain technology' is supported by hardware directx_va generic debug: - 'H.264 variable-length decoder, no film grain technology, Flash' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x6719b6fb-0x5cad-0x4acb-0xb00af3bfdec38727' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x9901ccd3-0xca12-0x4b7e-0x867ae2223d9255c3' is supported by hardware directx_va generic debug: - 'VC-1 variable-length decoder' is supported by hardware directx_va generic debug: - 'Unknown decoder 0xca15d19a-0x2b48-0x43d6-0x979e7a6e9c802ff8' is supported by hardware directx_va generic debug: - 'MPEG-4 Part 2 variable-length decoder, Simple&Advanced Profile, Avivo' is supported by hardware directx_va generic debug: - 'H.264 stereo high profile, mbs flag set' is supported by hardware directx_va generic debug: - 'H.264 stereo high profile' is supported by hardware directx_va generic debug: - 'MPEG-4 Part 2 variable-length decoder, Simple&Advanced Profile, no GMC' is supported by hardware directx_va generic debug: - 'VC-1 inverse discrete cosine transform' is supported by hardware directx_va generic debug: - 'Windows Media Video 9 IDCT' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x103473e4-0x10ea-0x11df-0x9a922ba055d89593' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x84ad67f6-0x4c21-0x419a-0x9f0b24f0578906c1' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x725ad240-0x786c-0x471e-0xad3c38f739936517' is supported by hardware directx_va generic debug: - 'Unknown decoder 0x95664ff5-0x9e03-0x4c74-0xbb4f9178d6035e58' is supported by hardware directx_va generic debug: Trying to use 'H.264 variable-length decoder, no film grain technology' as input dxva2 generic debug: NV12 is supported for output dxva2 generic debug: Using decoder output 'NV12' core generic debug: looking for video converter module matching "any": 20 candidates core generic debug: using video converter module "dxa9" directx_va generic debug: directx_va_Setup id 28 1920x960 dxva2 generic debug: IDirectXVideoAccelerationService_CreateSurface succeed with 22 surfaces (1920x960) dxva2 generic debug: we got 3 decoder configurations dxva2 generic debug: configuration[0] ConfigBitstreamRaw 2 dxva2 generic debug: configuration[1] ConfigBitstreamRaw 2 dxva2 generic debug: configuration[2] ConfigBitstreamRaw 2 dxva2 generic debug: IDirectXVideoDecoderService_CreateVideoDecoder succeed core generic debug: using hw decoder module "dxva2" avcodec decoder debug: chroma mismatch YV12 expected DXA9 core generic debug: removing module "dxa9" core vout display error: Failed to change zoom core vout display error: Failed to set on top core vout display error: Failed to change source AR core vout display error: Failed to change Viewpoint core vout display debug: removing module "vmem" core video output debug: Opening vout display wrapper core vout display debug: looking for vout display module matching "vmem": 12 candidates core vout display debug: VoutDisplayEvent 'resize' 1920x960 core vout display debug: using vout display module "vmem" core video output debug: original format sz 1920x960, of (0,0), vsz 1920x960, 4cc DX11, sar 1:1, msk r0x0 g0x0 b0x0 core generic debug: reusing provided vout core vout display debug: VoutDisplayEvent 'window state' 0 core vout display debug: VoutDisplayEvent 'window state' 0 core generic debug: looking for hw decoder module matching "dxva2": 2 candidates core generic debug: no hw decoder modules matched core vout display debug: removing module "vmem" core video output debug: Opening vout display wrapper core vout display debug: looking for vout display module matching "vmem": 12 candidates core vout display debug: VoutDisplayEvent 'resize' 1920x960 core vout display debug: using vout display module "vmem" core vout display debug: A filter to adapt decoder J420 to display J420 is needed core filter debug: looking for video converter module matching "any": 20 candidates swscale filter debug: 1920x960 (1920x962) chroma: J420 -> 1920x960 (1920x960) chroma: J420 with scaling using Bicubic (good quality) core filter debug: using video converter module "swscale" core vout display debug: Filter 'Swscale' (000001f69fdd6768) appended to chain core video output debug: original format sz 1920x962, of (0,0), vsz 1920x960, 4cc J420, sar 1:1, msk r0x0 g0x0 b0x0 core generic debug: reusing provided vout

Rémi Denis-Courmont
Developer
Developer
Posts: 15267
Joined: 07 Jun 2004 16:01
VLC version: master
Operating System: Linux
Contact:

Re: VLC3 Hardware Acceleration

Postby Rémi Denis-Courmont » 30 Mar 2017 19:32

Hardware acceleration decodes the picture into GPU memory. You can't use the callbacks with that. You need to provide a window handle and let VLC render instead.
Rémi Denis-Courmont
https://www.remlab.net/
Private messages soliciting support will be systematically discarded

Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

Re: VLC3 Hardware Acceleration

Postby Remi73 » 31 Mar 2017 09:47

Ok ok, so the only way to get hardware decoded frame is using a custom LibVLC video output display plugin or rendering into a window handle and getting frames from there ? Event with a copy from video memory to main memory ?

(as say in callback description)

Code: Select all

/** * Set callbacks and private data to render decoded video to a custom area * in memory. * Use libvlc_video_set_format() or libvlc_video_set_format_callbacks() * to configure the decoded format. * * \warning Rendering video into custom memory buffers is considerably less * efficient than rendering in a custom window as normal. * * For optimal perfomances, VLC media player renders into a custom window, and * does not use this function and associated callbacks. It is <b>highly * recommended</b> that other LibVLC-based application do likewise. * To embed video in a window, use libvlc_media_player_set_xid() or equivalent * depending on the operating system. * * If window embedding does not fit the application use case, then a custom * LibVLC video output display plugin is required to maintain optimal video * rendering performances. * * The following limitations affect performance: * - Hardware video decoding acceleration will either be disabled completely, * or require (relatively slow) copy from video/DSP memory to main memory. * - Sub-pictures (subtitles, on-screen display, etc.) must be blent into the * main picture by the CPU instead of the GPU. * - Depending on the video format, pixel format conversion, picture scaling, * cropping and/or picture re-orientation, must be performed by the CPU * instead of the GPU. * - Memory copying is required between LibVLC reference picture buffers and * application buffers (between lock and unlock callbacks). * * \param mp the media player * \param lock callback to lock video memory (must not be NULL) * \param unlock callback to unlock video memory (or NULL if not needed) * \param display callback to display video (or NULL if not needed) * \param opaque private pointer for the three callbacks (as first parameter) * \version LibVLC 1.1.1 or later */ LIBVLC_API void libvlc_video_set_callbacks( libvlc_media_player_t *mp, libvlc_video_lock_cb lock, libvlc_video_unlock_cb unlock, libvlc_video_display_cb display, void *opaque );

oviano
Cone that earned his stripes
Cone that earned his stripes
Posts: 120
Joined: 12 Jan 2012 11:12

Re: VLC3 Hardware Acceleration

Postby oviano » 02 Apr 2017 08:00

On iOS (and I assume macOS) its possible to use vmem for full hardware decoding if you are prepared to modify the vmem code.

I have created a vmem.m which will support being operated with a corevideopixelbuffer. It passes this through unchanged via the callbacks (which I have also changed) and the app can then do what it wants with them. In my case I pass these directly to SDL2 to create a texture out of them using the efficient OpenGLES texture caching.

I believe the same approach could be used on macOS as it too uses VT but I haven't tried it. I did notice, however, that on macOS it seems happy to use VT and then copy and convert the buffer into memory for regular use by vmem.c. It seemd to work fine although I haven't investigated it much and there was a crash later on as it shutdown the video somewhere inside the vlc code.

Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

Re: VLC3 Hardware Acceleration

Postby Remi73 » 04 Apr 2017 09:26

Hi oviano, this sounds very interesting.
If I well understood, you are able to access GPU texture stored in a corevideopixelbuffer using the callback system, modifying vmem ?
I will investigate this way, thank you.

Rémi Denis-Courmont
Developer
Developer
Posts: 15267
Joined: 07 Jun 2004 16:01
VLC version: master
Operating System: Linux
Contact:

Re: VLC3 Hardware Acceleration

Postby Rémi Denis-Courmont » 04 Apr 2017 19:32

You can copy pixels back to CPU memory. But that's typically so slow as to negate the benefits of hardware decoding.

It's meant for occasional snapshots, not systematic use.
Rémi Denis-Courmont
https://www.remlab.net/
Private messages soliciting support will be systematically discarded

Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

Re: VLC3 Hardware Acceleration

Postby Remi73 » 04 Apr 2017 21:00

Ok so what should be the approach to get an access to this decoded texture ? In order to apply shader to these pixels.

oviano
Cone that earned his stripes
Cone that earned his stripes
Posts: 120
Joined: 12 Jan 2012 11:12

Re: VLC3 Hardware Acceleration

Postby oviano » 05 Apr 2017 17:18

Hi oviano, this sounds very interesting.
If I well understood, you are able to access GPU texture stored in a corevideopixelbuffer using the callback system, modifying vmem ?
I will investigate this way, thank you.
Yes, that's right.

In case it helps, here is a link to my custom VLCKit source that implements this.

https://emustream.tv/downloads/VLCKit.tar.gz

The main changes are the introduction of vmem.m to replace vmem.c and I changed the callback signatures to better suit my usage.

You will see two folders original and custom in case you want to run a diff to see all the changes I have made (some won't be important for you).

Remi73
New Cone
New Cone
Posts: 6
Joined: 29 Mar 2017 14:21

Re: VLC3 Hardware Acceleration

Postby Remi73 » 20 Apr 2017 09:23

In case it helps, here is a link to my custom VLCKit source that implements this.

https://emustream.tv/downloads/VLCKit.tar.gz

The main changes are the introduction of vmem.m to replace vmem.c and I changed the callback signatures to better suit my usage.

You will see two folders original and custom in case you want to run a diff to see all the changes I have made (some won't be important for you).
I took a look to your custom VLC kit, I'm not sure to understand everything.
What I understood on the main lines,
- remove lock/unlock system
- use picture_pool_t to alloc a native pixel buffer
- use prepare callback to share pointers
- let VLC do the hw decoding inside the allocated buffer
- use display callback to be informed each new decoded frame and access native buffer
Am I a bit right ?
And do you set the zero_copy boolean ?

By the way do you have a piece of sample you can share base on your VLCKit ? It would be very nice.

Thanks

oviano
Cone that earned his stripes
Cone that earned his stripes
Posts: 120
Joined: 12 Jan 2012 11:12

Re: VLC3 Hardware Acceleration

Postby oviano » 08 May 2017 16:35

I will share a sample as soon as I can put one together.

Actually, I have also come across the same problem as in your original post.

I've traced it through and I think I can see what is going on.

In va.c there is this code:

vlc_fourcc_t chroma;
vlc_fourcc_t expected = vlc_va_GetChroma( pix_fmt, avctx->sw_pix_fmt );
va->setup(va, &chroma);
if (chroma != expected)
{ /* Mismatch, cannot work, fail */
msg_Dbg( obj, "chroma mismatch %4.4s expected %4.4s",
(const char*)&chroma, (const char*) &expected );
vlc_va_Delete(va, avctx);

I think this is always going to produce a mismatch when using libvlc and trying to use dxva because inside va->setup (dxva2.c) is this code...

static void Setup(vlc_va_t *va, vlc_fourcc_t *chroma)
{
vlc_va_sys_t *sys = va->sys;

*chroma = sys->filter == NULL ? sys->i_chroma : VLC_CODEC_YV12;
}

...in this situation, sys_filter will have been created due to this code...

if (p_sys == NULL)
{
msg_Dbg(va, "DXVA2 : p_sys is NULL, so creating filter");
sys->filter = CreateFilter( VLC_OBJECT(va), fmt, sys->i_chroma);
if (sys->filter == NULL) {
msg_Dbg(va, "DXVA2 : couldn't create filter");
goto error;
}
}

...i.e. p_sys is NULL.

I need to investigate some more but I think the chroma mismatch check should be wrapped in a check for p_sys, like this:

if (p_sys) {
vlc_fourcc_t chroma;
vlc_fourcc_t expected = vlc_va_GetChroma( pix_fmt, avctx->sw_pix_fmt );
va->setup(va, &chroma);
if (chroma != expected)
{ /* Mismatch, cannot work, fail */
msg_Dbg( obj, "chroma mismatch %4.4s expected %4.4s",
(const char*)&chroma, (const char*) &expected );
vlc_va_Delete(va, avctx);
va = NULL;
}
}

However, if we wanted to actually have the opaque DXVA decoded surface passed to the vmem callback a similar manner to how I am doing on iOS using the CVPN, then we actually don't want to create the filter anyway because we don't want it converted.

Even if we do want it converted, I would have expected that a filter would get created automatically anyway when we asked for a different chroma to DXA9.

So I wonder why DXVA2 and D3D11VA are creating these filters automatically, there must be a reason somewhere else?

thecaptain0220
New Cone
New Cone
Posts: 9
Joined: 05 Dec 2014 16:09

Re: VLC3 Hardware Acceleration

Postby thecaptain0220 » 06 Aug 2018 23:59

Hi all,

Today I'm updating my app from vlc 2.2 to vlc 3, most of things looks ok except in the libvlc_video_format_cb callback.
Usually I use this callback to get the chroma, and force it to BGRA if I do not support this chroma.

Since vlc3, on mac os, libvlc_video_format_cb is called a first time with CVPY chroma then vlc_player_cleanup_cb is called and finally libvlc_video_format_cb is call a second time with J420 chroma (which is the expected chroma). After these calls, libvlc_video_lock/unlock_cb begin.
With vlc2, I had libvlc_video_format_cb (J420) > lock/unlock.. > vlc_player_cleanup_cb
Has anything changed here ?
Moreover, I get some green frames when video playback begin, I don't know if it can be linked.

Thanks!

Edit:
On Windows, same kind of behavior with DXA9 > cleanup > DX11 > cleanup > J420 > lock/unlock.
No green frame.
Have you come with with any solution for this? I am having the same issue. In VLC 2 I get one call to libvlc_video_format_cb with I420 which is the expected chroma. After updating I get 3 calls to libvlc_video_format_cb DX11, DXA9, then I420. It tries a bunch of filters to try to adapt DX11 -> I420 etc. This process takes enough time for it to be noticeable.

It also seems that the green frames are caused by libvlc_video_display_cb getting called with no data to display. I looked at the video data and its all 0xCD. I haven't worked out exactly how it happens of if its related, but that is definitely what its displaying.

Jean-Baptiste Kempf
Site Administrator
Site Administrator
Posts: 37523
Joined: 22 Jul 2005 15:29
VLC version: 4.0.0-git
Operating System: Linux, Windows, Mac
Location: Cone, France
Contact:

Re: VLC3 Hardware Acceleration

Postby Jean-Baptiste Kempf » 24 Aug 2018 13:51

Force to disable hardware decoding.
Jean-Baptiste Kempf
http://www.jbkempf.com/ - http://www.jbkempf.com/blog/category/Videolan
VLC media player developer, VideoLAN President and Sites administrator
If you want an answer to your question, just be specific and precise. Don't use Private Messages.


Return to “Development around libVLC”

Who is online

Users browsing this forum: No registered users and 22 guests