The second camera (Orthographic camera) will only draw the video-background, while the AR Camera will draw the models;
what you need to use the same concept (i.e. rendering the camera texture to a plane), but with a slightly different approach, in which you do not have the Orthographic camera, i.e.:
- have 1 ARCamera, with the Depth parameter set to 0
- disable video backgrounding drawing, as shown in the sample and as explained here too:
https://developer.vuforia.com/forum/faq/unity-how-can-i-disable-camera-video-background-rendering
- add a second perspective Camera (a standard Unity camera) as a child of the AR Camera, with Clear Flags setting set to Solid Color
- the second camera must be oriented exactly like the parent one (the AR Camera) and must be placed at the local origin of the parent
- add a script to the child perspective camera that ensures to set the filed of view of this camera to match the parent camera (i.e.e the ARCamera)
- render a plane located at some distance in front of the camera, and apply the camera texture to it; note, you need to make the plane fit the camera view, this may require some computation
This will require a little bit of coding, but we tested this approach and it is technically possible to achieve the above (although we don't have a step-by-step tutorial at the moment).
Nice!