no computer is
so the idea that this camera input is just like human eyes but better is both wrong and irrelevant
it's the wrong data
it's just more fist tracking with no ability to anticipate one.
and even that is too generous, because, of course, of dear old aunt sally.
THEN you'd be ready to START on the hard part or trying to teach context and anticipation.
this cannot be done.