Made a GPT-3 summarizer that reads websites just like humans do.
It scrolls pages and reads in visible text in chunks, which it then attempts to summarize.
This makes it a bit more robust than crawling HTML. Here you can see it summarizing fancy hotels on Flyertalk:
It's very brittle and has obvious flaws, but to me an under-explored path within agents -- we should use more visual information and less textual! (This demo isn't _quite_ the right approach with E2E ViTs & pixels; uses JS to find visible text; but you get the idea.)