Bryan Wang Profile picture
Feb 27, 2023 13 tweets 6 min read Read on X
#LLMs are powerful, but can they make existing GUIs interactable with language? Last summer at @GoogleAI, we found that LLMs can perform diverse language-based mobile UI tasks using few-shot prompting. Exciting implications for future interaction design! #chi2023 Thread 🧵 Image
🧠 Key Takeaway: Using LLMs, designers/researchers can quickly implement and test *various* language-based UI interactions. In contrast, traditional ML pipelines require expensive data collection and model training for a *single* interaction capability.

Learn more about it👇
To adapt LLMs to mobile UIs, we designed prompting techniques and an algorithm to convert the view hierarchy data in Android to the HTML syntax, which is well-represented in LLMs’ training data. Image
To broadly examine the feasibility of our approach, we experimented with four important language-based UI modeling tasks, including:

1) Screen Question Generation.
2) Screen Summarization.
3) Screen Question-Answering.
4) Mapping Instruction to UI Action.

Findings below. Image
Task 1: Screen Question Generation—given a mobile UI with input fields, such as a sign-up page, LLMs can leverage the UI context to generate questions for relevant information. Our study showed LLMs significantly outperformed the heuristic approach regarding question quality. Image
We also revealed LLMs' ability to combine relevant input fields into a single question for efficient communication. For example, the filters asking for the minimum and maximum price were combined into a single question: “What’s the price range?” Image
Task 2: Screen Summarization—LLMs can effectively summarize the essential functionalities of a mobile UI. They can generate more accurate summaries than the benchmark model (Screen2Words, UIST ’21) using UI-specific texts, as highlighted in colored texts and boxes. Image
Interestingly, we observed LLMs using their prior knowledge to deduce information not presented in the UI when creating summaries. In the example, the LLM inferred the subway stations belong to the London Tube system, while the input UI does not contain this information. Image
Human evaluation rated LLM summaries as more accurate than the benchmark, yet they scored lower on metrics like BLEU. The mismatch between perceived quality and metric scores echoes recent work showing LLMs write better summaries despite automatic metrics not reflecting it. Image
Task 3: Screen Question-Answering—LLMs can correctly generate answers to questions about a UI, e.g., “what’s the article's headline?”
Our 2-shot LLM generated Exact Match answers for 66.7% of questions, outperforming an off-the-shelf QA model that only correctly answered 36.0%. Image
Task 4: Mapping Instruction to UI Action. Given a UI and an instruction, LLMs predict the UI object to perform the instructed action. While our model didn’t beat the benchmark trained with vast datasets (89.2 partial, 70.6 complete), it reached (80.4, 45.0) using only *2* shots. Image
📄Check out our #CHI2023 paper for more details: arxiv.org/abs/2209.08655
This work was done in collaboration with my fantastic intern mentors @yangli169 and Gang Li from the Interactive Intelligence team at Google Research. Yet another summer well spent with the team! #LLM4Mobile
Prior work found the same for news summarization!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bryan Wang

Bryan Wang Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(