The landscape of artificial intelligence is shifting from static chat windows to fluid, conversational agents that mimic the natural cadence of human thought. Sesame, a startup founded by the visionaries behind Oculus, has recently unveiled an iOS app that aims to solve the “lag” problem in AI by allowing models to think and search in real-time while they speak. We sat down with Vijay Raina, a specialist in enterprise SaaS and software architecture, to discuss how Sesame’s technology is bridging the gap between speed and depth, the significance of their million-user beta, and their ultimate pivot toward wearable hardware in 2027. This interview explores the mechanics of parallel search retrieval, the evolution of agentic tools, and why the future of AI might move entirely off our phone screens and onto our faces.
Traditional AI models often face a trade-off between responding instantly and providing accurate, researched information; how does Sesame’s architecture allow an agent to maintain a natural conversation while simultaneously pulling in new data?
The brilliance of this architecture lies in how it handles the inherent tension between speed and thoughtfulness. Instead of the typical “stop-and-think” approach where a chatbot goes silent while processing, Sesame has built a system that runs multiple parallel searches while the AI is actively speaking. You can actually feel the difference in the flow, as the agent can pivot mid-sentence to incorporate a fresh fact it just retrieved from its search and retrieval systems. This mimics the human experience of remembering a key detail halfway through a thought, making the interaction feel far less like a database query and more like a live brainstorming session. It is a sophisticated dance of data retrieval that ensures the AI is both up-to-date and rhythmically natural.
With over a million people joining the research preview within its first few weeks, what were the most critical user-driven features that emerged from that high-pressure beta period?
That massive initial surge of a million users provided an incredible feedback loop that forced the team to think about how people actually use these tools in the real world. One of the standout additions was the implementation of search cards with image results, which helps users visualize complex concepts immediately rather than just reading a wall of text. They also realized that people aren’t always in a position to talk out loud, which led to the development of a dedicated texting mode and an incognito mode for private conversations. These features, along with a “deep dive” option for more intense research, show a clear transition from a tech demo to a versatile tool designed for daily life. It is about capturing takeaways through notes and ensuring the agent has access to context without compromising the user’s need for privacy.
The app introduces four distinct agents—Maya, Miles, Simone, and Charlie—each with their own voice and memory; why is this move toward individual personalities so vital for the next generation of software?
Moving away from the generic, one-size-fits-all assistant is a major step toward making AI feel like a collaborator rather than a utility. By giving Maya, Miles, Simone, and Charlie their own distinct points of view and memories, Sesame is creating a sense of continuity that is often missing in SaaS tools. When an agent remembers your past preferences or speaks with a specific tone, it lowers the cognitive friction of the interaction and builds a more intuitive relationship. This isn’t just about cosmetic voices; it’s about the AI having a consistent perspective that helps a user feel understood over long-term use. During the beta, Maya and Miles already proved this concept worked, attracting a huge audience who appreciated having a specific “who” to talk to.
Sesame has hinted that these are “agents” rather than just chatbots because they will eventually take action on a user’s behalf; how does this shift change the way we interact with technology?
The transition to “agentic” AI is perhaps the most exciting part of this $250 million Series B journey because it removes the burden of the “perfect prompt.” Currently, most AI tools require you to have a very specific idea of what you want and how it should happen, which can be a barrier for many users. A true conversational agent allows you to talk naturally about a goal, and the AI figures out the necessary steps to execute that task for you. This means you won’t have to be an expert at giving commands; you just have to be able to hold a conversation. As these agents learn to take action on your behalf, they move from being a source of information to a proactive partner that gets things done.
Looking ahead to the 2027 goal of intelligent eyewear, how does the current mobile experience across 39 countries serve as a foundational step for hardware-integrated AI?
The iOS app is essentially the training ground for the software brains that will eventually live inside a pair of glasses. For wearable AI to be successful, it has to be voice-first and incredibly low-latency, because you can’t be staring at a loading screen while you’re walking down the street. By launching in 39 countries now, the team is gathering diverse linguistic and contextual data that will make those future glasses feel seamless. The 2027 launch is the ultimate destination, where the screen disappears entirely and the agent lives in your ear and your field of vision. This app is proving that the “thinking while speaking” technology is robust enough to handle the demands of a world where AI is always on and always listening.
What is your forecast for the future of conversational AI agents over the next three years?
I expect we will see a rapid departure from the “search box” mentality as these agents become deeply integrated into our hardware and our daily routines. By 2027, the success of companies like Sesame will likely trigger a massive shift toward “invisible” interfaces where the primary way we interact with our data is through fluid, real-time speech. We will move from asking an AI to “tell me something” to asking it to “do something for me,” and it will happen with the same ease as talking to a friend. The era of the static chatbot is closing, and the age of the proactive, personality-driven agent is just beginning.
