Xenova Profile picture
Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)
Aug 22 4 tweets 2 min read
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

Who's building this? How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳
Nov 21, 2023 5 tweets 2 min read
Transformers.js v2.9.0 is now out! 😍 New features:
🎯 Zero-shot Object Detection w/ OwlViT
🕵️‍♂️ Depth Estimation w/ DPT and GLPN
📝 Optical Document Understanding w/ Nougat

... and you can get started in just a few lines of code! 🤯👇
Example output of the "Zero-shot object detection" task. The image is of an astronaut, with several objects (including an American flag, a model rocket, and a helmet) in the scene. Red bounding boxes, which were generated using a vision transformer model, surround the objects. 1. Zero-shot Object Detection is the task of identifying objects of classes that are unseen during training.

This means you can specify a list of words/phrases at runtime, and the model will generate bounding boxes for any occurrences it finds!

Example code: // npm i @xenova/transformers import { pipeline } from '@xenova/transformers';  // Create zero-shot object detection pipeline let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');  // Predict bounding boxes let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png'; let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag']; let output = await detector(url, candidate_labels);