Then last week, @nicoptere introduced me to SIREN via this mindboggling SDF shader by @suricrasia which is able to ray-march a generated Stanford bunny in less than 100 lines 🤯. shadertoy.com/view/wtVyWK
The author, @suricrasia also made a great video about it, along with a notebook to train and export the model:
SIREN (for Sinusoidal Representation Networks) were introduced in the NeurIPS 2020 paper "Implicit Neural Representations with Periodic Activation Functions". vincentsitzmann.com/siren/
After a little bit more research, I was not surprised to find out that @quasimondo had already been there, and created "Call of the Siren", an interactive 3D-ish SIREN encoding of one of his prior artwork. Astonishing!
Of course, I had to fire up a notebook and make my own :)
One diff. was that I didn't want to train the model on a single sample, but on a full dataset.
So I added an extra Z noise input to the SIREN module and trained the whole thing as a GAN instead of the SSIM loss.
After a quick training, I used @suricrasia's code to serialize the model as GLSL code. And voilà!
No magic, the result *really* is just a bunch of mat4 multiplications with sin() activations.
As expected, it runs super fast as a shader on GPU, and it's able to generate new digits fairly convincingly, at any resolution (even though it was only trained on 28x28 images).
At very high resolutions, it starts to generate beautiful abstract details.
It's been nice to play with such a compact model & simple pipeline. And a good reminder of how much can be done outside of pytorch.
If you need more convincing, just check out the incredible work of @zzznah (DeepDream creator) on GLSL NCAs: distill.pub/2020/growing-ca
That's it for this week! Please let me know if you have any feedback or question.
Have a great day/night! ❤️
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You can extract anything: objects, people, drawings, and text. The quality of the salient object detection, background removal, and text detection is now quite incredible 😳😲🤯
The magic here is to use ARCore + AugmentedImages rather than SIFT.
Phone gets a new desktop screenshot on touch and adds it to ARCore (< 100ms).
Tracking is crazy fast & precise.
Interesting alternative to touch screen for interactive installations!
The text detection is performed on device with @Firebase#MLKit. Super fast, good accuracy and cross platform.
The secret sauce here is BASNet (Qin et al, CVPR 2019) for salient object detection and background removal.
The accuracy and range of this model are stunning and there are many nice use cases so I packaged it as a micro-service / docker image: github.com/cyrildiagne/ba…
And again, the OpenCV SIFT trick to find where the phone is pointing at the screen.