“If I had asked people what they wanted, they would have said faster horses.” That sentiment, along with derivatives like “people don’t know what they want until you show it to them,” makes predicting the future of technology difficult as it takes one innovation to completely shift the paradigm. It’s especially the case for the coming wave of AI features for new and existing Google apps.
Google was not blindsided by what’s to come. The company publicly talked about natural language understanding (NLU) and large language models (LLMs) at the last two I/O developer conferences, its biggest event each year. There was Language Model for Dialog Applications in 2021 with a talking to Pluto demo, and LaMDA 2 last year with the ability to demo through the AI Test Kitchen app.
There’s also the Multitask Unified Model (MUM) that can one day answer “I’ve hiked Mt. Adams and now want to hike Mt. Fuji next fall, what should I do differently to prepare?” and the future ability to take a picture of a broken bike part in Google Lens and get instructions on how to fix it.
Beyond detailing its technology, Sundar Pichai more tellingly said “natural conversation capabilities have the potential to make information and computing radically more accessible and easier to use.” Search, Assistant, and Workspace were specifically named as products where Google hopes to “[incorporate] better conversational features.”
However, as recent discourse proves, that was not enough to make people remember. Instead Google is guilty of not providing more specific examples that captured the public’s consciousness of how these new AI features would benefit the products they use everyday.
Then again, even if more concrete examples were provided in May of 2022, it would have been quickly steamrolled by the launch of ChatGPT later that year. The OpenAI demo/product is available to use (and pay for) today, and there’s nothing more tangible than experience. It has spurred many discussions about how direct responses could impact Google’s ad-based business model, with the thinking being that users would no longer need to click on links if they already got the answer as a generated and summarized sentence.
What Google was blindsided by is the speed at which competitors have integrated these new AI advancements into shipping apps. Given the “code red,” it’s apparent that the company didn’t think it would have to roll out anything beyond demos so soon. Safety and accuracy concerns are something Google has explicitly emphasized with its existing previews, and executives are very fast to point out how what’s on the market today “can make stuff up,” which would be reputationally damaging if it ever launched on something the scale of Google Search.
In announcing layoffs, a leak from the New York Times emerged the same day describing over 20 AI products that Google was planning to show off this year, as soon as I/O 2023 in May.
These announcements, presumably led by a “search engine with chatbot features,” seem very much intended to match OpenAI toe-for-toe. Particularly telling is an “Image Generation Studio” that seems like a DALL-E, Stable Diffusion, and Midjourney competitor, with a Pixel wallpaper creator possibly being a branch of that. Of course, Google will be wading right into the backlash from artists that generative image AIs have resulted in.
- AI Test Kitchen adding text-to-image demos
Besides Search (more on that later), none of what was leaked seems to radically change how an average user interacts with Google products. Of course, that has never been Google’s approach, which has been to infuse existing products – or even just parts of them – with small conveniences as the technology becomes available.
There’s Smart Reply in Gmail, Google Chat, and Messages, while Smart Compose in Docs and Gmail don’t quite write the email for you but the auto-complete suggestions are genuinely useful.
On Pixel, there’s Call Screen, Hold for Me, Direct My Call, and Clear Calling where AI is used to improve a phone’s original key use cases, while on-device speech recognition makes possible an excellent Recorder and faster Assistant. Of course, there’s also computational photography and now Magic Eraser.
That isn’t to say that Google hasn’t used AI to create entirely new apps and services. Google Assistant is the result of natural language understanding advancements, while the computer vision that makes possible search and categorization in Google Photos is something we take for granted over seven years later.
More recently, there’s Google Lens to visually search by taking a picture and appending questions to it, while Live View in Google Maps provides AR directions.
Then there’s Search and AI
Post-ChatGPT, people are imagining a search engine where your questions are directly answered by a sentence that was entirely generated for you/that query, which is in comparison to getting links or being shown a “Featured Snippet” that quotes a relevant website that might have the answer.
Looking at the industry, it feels like I’m in the minority in my lack of enthusiasm for conversational experiences and direct answers.
One issue with the experience that I foresee is not always (or even frequently) wanting to read a full sentence to get an answer, especially if it can be found by just reading one line in a Knowledge Panel; be it a date, time, or other simple fact.
Meanwhile, it will take time to trust the generative and summarization capabilities of chatbot search from any company. At least Featured Snippets allow me to immediately see and decide whether I trust the publication/source that’s producing the quote.
In many ways, that direct sentence is what smart assistants have been waiting for, with Google Assistant today turning to facts (dates, addresses, etc.) that it already knows (Knowledge Panels/Graph) and Feature Snippets otherwise. When you’re interacting with voice, it’s safe to assume you can’t readily look at a screen and want an immediate answer.
I’m aware that the history of technology is littered with iterative updates that are trampled in short order by new game changing innovations, but it doesn’t feel like the technology is there yet. I think back to the early days of voice assistants that explicitly tried to replicate humans in a box. This upcoming wave of AI has shades of approximating a human answering your question or doing a task for you, but how long does that novelty last?