What I meant, was that while the Yadif2X and IVTC filters are GREAT for certain films and medias, they don't seem very intelligent at all for media with very unstable progessive frames, like old movies or certain very-high-quality CGI films. If there was a way to mix the Yadif2X and IVTC filters together for better predictiveness of progressive B- and P-frames, that would work completely for those films. If that ever works out, I would simply recommend calling it Yadif+IVTC.
As the author of VLC's IVTC, I agree it's not very intelligent :P
It should be slightly more accurate than Transcode's and Xine's IVTC filters (since it selectively combines both approaches, and on top of that supports soft telecine), but it still works on a full-frame basis. It was designed to work primarily with anime, which has more difficult motion characteristics than live-action film, due to the common use of 8fps and 12fps animation in the telecined stream. This may limit its effectiveness for some other kinds of material.
The technical reason is that the possibility of 8fps and 12fps animation causes limitations as to how the filter can make its predictions about the telecine sequence. In such material it is a common occurence that nothing moves between two film frames, so we cannot assume that motion always occurs when the film frame changes. Of course, the placement of the film frames within the input stream is a priori unknown, so they must be detected somehow (this is called locking on to the cadence). But the only thing the filter sees on its input side is the telecined stream, which has no useful flags to help the filter... hence, realtime analysis of the actual picture.
Over 99% of anime is hard-telecined, based on a full-frame telecine. Thus, the filter works pretty well for the kind of material it was designed for. However, I've seen one case of anime ending credits (Claymore) where the left half of the picture (having the actual credits scroll) was pure interlaced, while the right half of the picture was made of telecined animation. The full-frame approach obviously fails with such material. Also, according to 100fps.com, some NTSC music videos liberally mix telecined and interlaced material in different parts of the same frame.
The thing with IVTC is, (a part of a frame in) NTSC video is either telecined or it is not. If it is telecined, then IVTC can help. The best case scenario is a 100% correct reconstruction of the original progressive stream. In practice this is never reached, but over 99% is possible (with the occasional missed frame or slow vertical camera pan). If the video is not telecined, then IVTC will in the best case do nothing (if it detects correctly the lack of telecine), and in the worst case it will damage the picture (thinking it is telecined and running the reconstruction process).
Yadif is a completely different kind of beast, called an interpolator. Interpolating filters will make a pure interlaced stream look better, but they will damage telecined material (they do not always detect correctly, nor do they have the framerate conversion which is needed for IVTC). For 60fps input, the basic Yadif version will output 30 fps, recreating the missing field out of thin air, and potentially discarding 50% of the input data. The Yadif2x version will keep the output at 60fps, using all input data, and recreating the missing fields out of thin air. Both versions are area-based, i.e., they work locally instead of the full frame.
See
http://wiki.videolan.org/Deinterlacing for some more info.
So - disregarding the shortcomings of the full-frame approach for now - the problem of combining IVTC and Yadif becomes that of detecting whether a given frame is telecined, interlaced, or progressive, and applying a different filter (or not applying any filter) based on that. While developing the IVTC filter last December, I tested some technical ideas regarding automatic detection of the material type. It didn't work very well. Either I need better ideas, or this is a hard problem :)
I also tested the possibility of having a backup deinterlacer when IVTC fails. The problem is that it is really hard to detect whether a picture is interlaced or not. Most of the time, this experiment ended up detecting incorrectly and damaging progressive frames (that had been correctly reconstructed by IVTC).
Then, there is one further complication - the framerates do not match. IVTC will output at 24fps when locked on to the cadence, and 30fps otherwise. Interlaced video in NTSC is always 60fps. The lowest common multiple of 24 and 60 is 120, i.e. one would need 120fps output to be able to accommodate both exactly. However, it is possible to use the 2,3,2,3,... trick (like in soft telecine) for the 24fps material, and output at 60fps. The current filter was not designed for this - it's doable, but it would need some nontrivial changes.
Finally, I have a question. Can you provide some examples of the kinds of instabilities you mentioned? I'd like to understand the picture and motion characteristics for those, to see what could be done about them.