Apple #VisionPro eye-tracking mega-thread! I'm seeing some concern about eye fatigue. @EricPresidentVR brought up this legitimate worry. I think there is some misunderstanding of how AVP differs from other implementations. I'll also dive into some interesting eye UX work. (1/)
Eye gaze is tricky to use as an input because it is so fundamental to our bodies that we don't feel like we actively control it and being made aware of it feels creepy and fatiguing. Doubly so because you literally cannot look away. (2/)
It suffers from the "Midas touch" problem, named after the mythical King Midas, who was granted his wish that anything he touch turn to gold. Midas regretted it when he reached out to comfort his distraught daughter one day, turning her to lifeless gold. Oopsies! (3/)
Apple sidesteps this problem in a clever way. You're almost never directly made aware of your gaze (although some elements apparently can react lightly) until you perform the pinch gesture. Done right, this should feel almost telepathic. Think about GUIs you use today... (4/)
Touch interfaces or mouse-driven ones. They are not tactile -- you *must* look where you are touching, pointing, clicking. You do so involuntarily. As long as AVP can track gaze accurately and quickly enough, you should not expend any conscious effort "staring" at things. (5/)
AVP's eye tracking setup is quite different from Meta's Quest Pro. Karl Guttag as usual has an excellent overview. On AVP, the IR illuminators and cameras go *through* the optics, allowing them to be placed further back and giving them a better view of your whole eye. (6/)
On Quest Pro, they are embedded on a ring outside of the optics, giving them a more indirect look at the eye. (7/)
Publicly, it has been disclosed that Apple acquired SMI, an eye tracking vendor that was founded way back in 1991! Whether or not their personnel were involved in this project, the company can certainly draw upon deep expertise. (8/)
Bottom line about the AVP gaze interface: if the eye tracking implementation is good enough, you shouldn't have to *do* anything consciously. Just express intent by pinching. You'll already be looking at the UI element you want to interact with naturally, without thinking! (9/)
Another interesting fact mentioned during the unveil is that apps *do not* have access to eye gaze. Pinch is a system-level gesture and only then does an app get information about where the user was looking when the pinch was detected. This is interesting for two reasons... (10/)
1) Privacy is the stated reason. Personally not too concerned about privacy and would prefer having access to gaze vectors. But it's a valid point. Even without camera access, knowing exactly what you are looking at on e.g. a shopping or social app is ripe for abuse. (11/)
2) Blocking access to direct gaze also prevents apps from implementing awful UX and annoying users with it! @Alientrap did a great demo of drawing with your eyes, which he knew would feel terrible. Can't even do this on AVP!
Are there reasons to use gaze directly? Yes, there are circumstances where it might make sense (people with disabilities, specialized interfaces, etc.) There is a body of research and some interesting public demos on actively using eye gaze in a reasonably comfortable way. (13/)
Here's a recent paper comparing three control options: dwell, pursuits, and gestures. The videos are a great introduction to the topic. dl.acm.org/doi/10.1145/35… (14/)
Dwell is what you expect: staring at something for a fixed amount of time. It's as annoying and slow as you'd expect. Pursuit is interesting: you follow a moving target with your eye to indicate intent. Here's a paper from 2013 describing it: perceptualui.org/publications/v… (15/)
One type of gesture is "enter-and-leave", shown in this figure. Your eye enters a region, may optionally linger for a time, and then may enter, perhaps with a constrained direction. From this great paper: sciencedirect.com/science/articl… (16/)
You can imagine these techniques being combined in interesting ways. In 2017 I saw a really fascinating demo from a Texas company called "Quantum Interface". Their interface worked for touch, "head gaze" (head direction only), and eye gaze. (17/)
Here's an absolutely garbage quality video I snagged from their old Twitter account @qimotions. Notice how it works: you aim with your head at the target which unfolds some options, and each level of options requires you to change direction a little to hit them. (18/)
This forced change in direction prevents you from accidentally moving through a target and selecting the next one. They even had an interesting demo for HIPPA-compliant interfaces requiring double confirmation... (19/)
You'd select an option by gaze, a red x and green check would pop out from either sides. If you selected the check on the right, a second check would pop up on the left forcing you to intentionally change directions to securely confirm. (20/)
I hope this was informative. I think Apple really thought deeply here and has implemented eye gaze in a natural, minimalistic way that with adequate hardware and software support, should feel subconscious and *not* active. (22/22)
DISCLAIMER ADDENDUM: This and other threads are me speaking as an unaffiliated and independent AR developer, citing public info only.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Let's talk about the most important feature of the Apple #VisionPro that is getting the least attention right now: high quality spatial audio. Apple understands audio and TDG's VP, Mike Rockwell, was formerly a VP at Dolby. Look the size of those speaker drivers! (1/)
Audio is a huge part of our perception of space. The soundscape around us helps us localize objects both directly -- when they emit sound, esp. beyond our visual field -- and indirectly, when reflections give us a sense of the dimensions and composition of our 3D space. (2/)
Spatial audio is an important cue that grounds virtual objects in our space. This isn't just important for making a dinosaur in our room feel present. It helps orient abstract objects, like app UI, allowing a unified mental model of the real and virtual. Less mental work. (3/)
Natural language interfaces have truly arrived. Here's ChatARKit: an open source demo using #chatgpt to create experiences in #arkit. How does it work? Read on. (1/)
JavaScriptCore is used to create a JavaScript environment. User prompts are wrapped in additional descriptive text that inform ChatGPT of what objects and functions are available to use. The code it produces is then executed directly. You'll find this in Engine.swift. (2/)
3D assets are imported from Sketchfab. When I say "place a tree frog...", it results in: createEntity("tree frog"). Engine.swift implements this and instantiates a SketchfabEntity that searches Sketchfab for "tree frog" and downloads the first model it finds. (3/)