As with conversations, context is of paramount importance. It is very hard to derive meaning from any conversation if there is not sufficient context. That is the underlying principle of RAG, to supply the LLM with context at inference.
Ferret-UI is a model designed to understand user interactions with a mobile screen.
The image is below is quite self explanatory, on how the mobile screen can be interrogated in natural language. There are numerous use-cases which comes to mind.
This solution can be seen as a conversational enablement of a mobile operating system. Or the information can be used to learn from user behaviour and supply users with a customised experience.
This is something which is referred to as ambient orchestration, where user behaviour can be learn and suggestions can be made by the mobile OS, automation of user routines can be intelligent and truly orchestrated.