LAMs and the Next Generation of Digital Assistants
New AI devices aim to revolutionize the way we interact with technology, but can they compete with smartphones?
You're walking home from work. You typically enjoy the 30-minute commute, but today it’s bitterly cold. Your hands are stuffed into your pockets, and after enough shivering, you decide to splurge on delivery pizza to be the light at the end of this tunnel. Leaving your hands firmly tucked in your warm pockets, you feel for a familiar button and state: "pizza for two at home… order now." Gone are the days when you price checked between delivery apps and scrolled through listing after listing of enticing images. You know your device will host some sort of auction to find you the lowest cost delivery vendor from the pizza place you’ve always liked. It'll place your usual order, Hawaiian, and you’ll walk inside to the smell of melted cheese.
The bad news: this isn't reality yet. But if this made you hungry, the pizza part still can be.
The good news: We’re one step closer to this future with a wave of new assistant devices currently entering the market.
AI, and technology as a whole, has been lauded for its ability to "make life easier." It logically follows that many consumers' favorite use case is the virtual assistant. The first virtual assistant, ELIZA, created in 1966, was intended to simulate a conversation with a psychotherapist. Of course, in 1966, there could have been no way of knowing the complexity of the tasks that we rely on technology for today.
We regularly turn to our laptops, if not our smartphones, to manage schedules, book travel, and communicate across a slew of modalities--all computationally complex endeavors. With this increase in task complexity comes the expectation for an assistant with capabilities far beyond conversation.
In the early 2010s, voice recognition came to the smartphone enabling the voice activated assistants we are familiar with today. These assistants are intended to more seamlessly integrate into daily life but concerns still abound surrounding their "always listening" nature, as well as frustration with misheard instructions and an inability to parse some accents.
So, where do we go from here?
At this year's Consumer Electronics Show, a novel attempt at a digital assistant was showcased: Rabbit's R1. The R1 is positioned to create a new way to interact with commonly used apps and the internet writ large. In an interview with Jesse Lyu, the founder, he laments that phones haven't been "cool" since the age of flip phones. While there may be a bit of a "good old days" glow on the statement, Lyu is correct that the form factor of phones hasn’t changed much in the last few years. In contrast, Lyu pitches the R1 as a new, exciting and different type of assistant.
But is it actually that different from a smartphone? Or from current voice operated assistants on the market?
There's an argument that in the attention economy, a maximalist approach to design tends to make for the most profitable outcome. Think about the blog posts with cooking recipes—they are often pages and pages long with lengthy forewords and numerous stock images, just for the recipe itself to only be three bullet points. These recipe pages are not typically designed for ease of use, they are designed for search engine optimization and ad space.
In the same vein, the owners of many phone apps are selling screentime—therefore lacking incentive to streamline their processes. This misalignment of incentives between app owners and users may be leaving the market ripe for an alternative approach, where the R1 and similar devices hope to step in.
A “personal AI device” of this sort would respond to vocal commands, operating apps on the user's behalf to complete tasks. You can envision these tasks as something as simple as "Play me Olivia Rodrigo's most recent song" or something as complex as "Book me a 5-day trip to Switzerland."
In the case of Rabbit, their R1 device leverages an LLM (Large Language Model) to comprehend what is being asked, but, uniquely, uses an LAM (Large Action Model) to execute the task itself. LLMs may be common parlance in our current buzzy climate around generative AI, but LAM is a relatively novel term. An LAM can learn any interface, from any software, through learning by demonstration—it simply observes a human using the interface and recreates that process.
Of course, there are drawbacks to the LAM model. This approach claims to prioritize privacy by having the user decide when the device listens. But the idea of handing over credit card payments to an unknown startup that makes purchases on the user's behalf, or even letting an LAM watch enough of your behaviors to emulate you, may be plenty daunting to the security-conscious.
We don’t have an idea of reviews and customer experience yet, but similar devices released in the past year have been critiqued for their limitations in comparison to the smartphone. Such devices, like the R1, lack the visual clarity of smartphones and may not be as sturdy as current phone models. Competing against incumbents may end up being a greater challenge than new AI devices are prepared for.
Ultimately, new devices that can partially, but not fully, replace the smartphone may be impractical for many consumers (particularly in this more challenging economy). Still, a lot can be learned about the market from the enthusiasm surrounding the launch of new digital assistants like the R1 or Humane Pin. Consumers are clearly excited about a simplified, functional, voice-first design.
Looking forward, I can see this enthusiasm propelling further iterations of these new devices and competitor products alike, opening up new possibilities for combining technology interaction and AI. The days of browsing tedious delivery apps for that pizza are hopefully numbered.
Hillary Umphrey | NExT Futurist, Deloitte Consulting LLP