Ok, an update on this on the vague chance anyone is remotely interested - I got it working, it needed "videotoolbox-zero-copy" to be disabled so that the decoded frames reach the vmem output.
I also noticed something a little unsatisfactory; the native decoding format on iOS is NV12, yet there is a rather cumbersome function in videotoolbox.m which blindly converts to I420, and not very efficiently. I profiled it in Xcode and something like 60% of execution time was spent in the function splitPlanes called via copy420YpCbCr8Planar (it's not hard to see why this function is slow).
So my suggestion is not to convert to I420 and leave it up to whatever is using the decoder to worry about any conversion. For example, by turning this into a straightforward Nv12toNv12 copy and leaving it up to swscale to do chroma conversion to I420 this led to something like 3% of the execution time for the conversion being done in swscale, vs 60% as mentioned in the function above. Since then I've adapted my video player to work directly in NV12 anyway (since SDL2 supports this format) so I can eliminate any conversion done by the CPU entirely now.
To put a better perspective on what this change meant for my project - it now allows me to decode a 1080p60 4mbit/s h.264 transport stream smoothly on an iPad Air 2 without stutter whereas before my changes this wasn't possible and it would play a couple of seconds, freeze, play some more seconds, freeze, etc. Bear in mind I'm using vmem (I know, I know) and copying back to a texture to blend with my UI.
Here is my patch anyway.
https://www.dropbox.com/s/e0dqj6kirvh29 ... patch?dl=0