1/ search google, take first 3 results 2/ call to a @Replit to get content (headless browsing) 3/ 1st model call to clean-up and summarize each site 4/ 2nd model call (few-shot prompted) to generate final answer with references
Step 1/ 2/ 3/ replace browsing / clicking / scrolling / selecting with a more direct API-based approach. Less shiny but more resilient.
Note that text-davinci-002 is really good at step 3/ out of the box without few-shot examples. Hypothesis: Instruct-series training set probably includes a lot of summarization tasks.
Step 4/ required few-shot prompting to teach the model how to aggregate content and generate references. Two examples were sufficient! Completely fails otherwise.
Original WebGPT used 6000 human demonstrations. This uses tools (@replit packaged headless browsing) + APIs (google) + 2 examples.
We hypothesize that data and symbolic structure can be traded for many use-cases. We'll eval this app against the same benchmarks to confirm this.