You can extract anything: objects, people, drawings, and text. The quality of the salient object detection, background removal, and text detection is now quite incredible 😳😲🤯
The magic here is to use ARCore + AugmentedImages rather than SIFT.
Phone gets a new desktop screenshot on touch and adds it to ARCore (< 100ms).
Tracking is crazy fast & precise.
Interesting alternative to touch screen for interactive installations!
The text detection is performed on device with @Firebase#MLKit. Super fast, good accuracy and cross platform.
The secret sauce here is BASNet (Qin et al, CVPR 2019) for salient object detection and background removal.
The accuracy and range of this model are stunning and there are many nice use cases so I packaged it as a micro-service / docker image: github.com/cyrildiagne/ba…
And again, the OpenCV SIFT trick to find where the phone is pointing at the screen.