This is an R&D topic that I'd like to discuss.
Right now the Vuforia SDK is perfectly suited for tracking one or more large markers/trackables/ImageTargets. These need to fill a pretty big portion of the camera frame, so that the software can compute the pose of the marker in front of the camera. This can be done with an ImageTarget, or a MultiTarget.
But we're also looking at outdoor/larger-scale AR (e.g. block-size, room-size). Now, in many cases it will not be possible to fill the camera-image with a trackable. Also, building facades aren't planar, rooms aren't always filled with wall-to-wall paintings, etc..
But what if the environment contains many small elements that have clusters of features that -on their own- are too small to do a pose estimation with, but taken together:
- can be matched to a pre-trained "microtarget"
- could be used as input for 3-Point Pose, 4-Point Pose, SoftPOSIT, RANSAC, etc algorithm?
Of course, one would need to know the 3D location of each of these clusters of features, but standard photogrammetry techniques can be used for that. We have several projects in the works that would benefit tremendously if this were possible. Yes, we would have to put in a sizable amount of work to document the to-be-augmented environment in such a way, but that we're willing to do. I suppose we're talking about model-based tracking here?
Having said that, there are many shapes that are easily recognizable by humans, such as logos, but hard to track with SIFT/SURF/FAST/etc based trackers. Reason: not enough corners. For those shapes, my guess is that template matching would be a better fit.. Is that guess correct?
If these ideas make sense, then I think these features would be great additions for the next version of Vuforia:
- addition of either "microtargets" and/or access to the list of detected/tracked features from the tracker
- addition of template matching or another algorithm that would enable the tracking of small patches that cannot be tracked properly with 'standard' NFT-techniques (under changes of rotation, scale and limited perspective/non-uniform scale.. rotation may not really needed since we know the orientation of the device from gravity and can pre-rotate the template before matching)
- optional: addition of a pose-estimation step that takes these microtargets as inputs
Looking forward to hearing your thoughs!