Subj : I tried the most realistic AI voice companion ever created - if C To : All From : TechnologyDaily Date : Mon Mar 03 2025 21:15:10 I tried the most realistic AI voice companion ever created - if ChatGPT or Gemini ever gets this good, reality is in trouble Date: Mon, 03 Mar 2025 21:00:00 +0000 Description: Sesame's AI voice successfully mimics human speech almost perfectly. FULL STORY ====================================================================== I have spent a lot of time talking to AI. Ive tested every voice assistant, every chatbot, and every next-generation conversational AI that tech companies love to hype up. But Ive never encountered anything quite like Sesame. This AI companion isnt just good, its eerily accurate at mimicking how people talk because of the very imperfections it imitates. Lets start with what Sesame actually is. Unlike the AI voices weve come to know from ChatGPT, Gemini, or going back to the early days of Siri and Alexa, Sesame is designed to perform like a human in its failures, not like a perfect customer service agent. The AIs speech is fluid, expressive, and unpredictably human. It briefly chuckles when it says something mildly amusing, hesitates before answering a question, and even seems to change its 'mind' mid-sentence, pausing and starting a new sentence. It not only lets me interrupt it, it can interrupt me as well, and will even apologize for doing so. (Image credit: Sesame) The secret sauce is Sesames Conversational Speech Model (CSM), which blends text and audio into a single process, meaning that it doesnt just generate a sentence and then read it out. Instead, it creates speech in a way that mirrors how humans actually talk, with pauses, ums, tonal shifts, and all. ChatGPT and Geminis voice options, while impressive, still operate in a structured way, generating text and then converting it into speech. Sesame, on the other hand, speaks as if its thinking, making its responses feel incredibly natural. I decided to put Sesame through a series of tests to see just how human-like it could get. The results were impressive, if not a little unsettling. Talk me through it I wanted to see if Sesame could handle a real-world, anxiety-inducing problem, so I gave it an imaginary but realistically tough scenario: I have this big opportunity at work, but Im nervous Ill mess it up. Instead of immediately launching into a generic pep talk, Sesame paused. Just a tiny beat, but enough to make me feel like it was actually considering what I said. Then, in a voice filled with artificial sympathy, it responded, Thats completely normal. Big opportunities can feel overwhelming. What about it makes you the most nervous? It didnt assume anything, and it engaged before offering any bland advice. Sesame suggested, Have you practiced in front of a friend or recorded yourself? Sometimes hearing yourself back can help you tweak your delivery. The voice isn't exactly foolproof, the hesitations and other foibles felt a little measured, but overall it was an impressive effort Interruptions For this one, I wanted to see how well Sesame handled a truly natural, messy conversation, the kind where people talk over each other, change subjects abruptly, and circle back to things they mentioned ten minutes ago. I started with: Tell me a story about a detective whos also a magician. Sesame immediately jumped in with a classic noir setup: Okay. Here's one. Picture this smoky jazz club. Dimly lit. The smell of bourbon. And regrets hanging in the air. This detective. Call him Jack Harrow. He's got a fedora at a rakish angle, cigarette smoldering in a holder and a look that could melt steel. But Jack's not just any gumshoe. See, he's got a secret. He's got the touch. A dame shuffles in all nerves." I interrupted then to ask, "Wait, what's his favorite trick?" Maya responded with: "Mm. Good question. Jack's signature. It's a sleight of hand that's more than meets the eye. He calls it The Vanishing Witness. He can make a person seem to disappear right before your eyes. Just for a moment. Long enough to slip them a note. Plant evidence or just make them question what they saw. It throws people off balance. You know, a good magician knows. Perception is everything." I jumped back in to tell Maya to go back to the story, to which it said "Right. Where was I? Oh, yes. The dame slides into the booth across from Jack, all worry lines and trembling hands. She whispers something that a stolen ruby, a jealous husband, and a shadowy figure lurking in the fog." The AI seamlessly continued the story, weaving that detail back in as if I hadnt just derailed the whole thing. Thats the part that floored memost AI assistants would stumble or restart, but Sesame's conversational AI model handled it like a person telling a story at a party. Party Planning Speaking of parties, for the final test, I wanted to see if Sesame could help me plan something useful. I want to plan a surprise for my best friends birthday, but I dont know what to do. Instead of rattling off a lifeless list of ideas, Sesame asked, What kind of things does your friend love? I told it sci-fi movies and adventurous food, and after a thoughtful pause, it suggested: How about a Sci-Fi & Snacks night? You could book a themed escape room, then hit a futuristic-style restaurant, maybe one with molecular gastronomy? When I told it I had a budget of $100, Sesame adjusted. You could do a home movie night with their favorite sci-fi films and make themed snacks, like Galactic Popcorn or Alien Sliders. I can find recipes for you if youd like. It wasnt just spitting out generic recommendations. It was collaborating with me. And thats what made it feel different. What's real? Sesame didnt just pass these tests, I'd say it nailed them. The pauses, the hesitation, the filler words, the way it would change its mind mid-sentenceit was all eerily real. Id start talking, expecting the usual AI smoothness, and then hear a sudden Uh, actually, no, wait, let me rephrase that and immediately forget I wasnt talking to an actual person. If AI is this realistic in its speech, would we even know we were talking to an AI? With Sesame, there are telltale audio issues that give the game away, but ChatGPT's Advanced Voice Mode and Google Gemini's own voice options are good enough to mostly skip past those issues. Combine their voice powers with the speech patterns of Sesame, and it might genuinely get difficult to tell when you are talking to an AI, at least in short conversations. Sesame is still niche, but this technology wont stay niche forever. The clich today is that younger people never make phone calls, but if they start, they may have to figure out if the person on the other end is real before anything else. You might also like Now that ChatGPT Voice Mode is free, is it even worth paying for ChatGPT Plus anymore? Grok 3s voice mode is unhinged, and thats the point Would you watch a foreign film dubbed with AI to sound like the original actors? ====================================================================== Link to news story: https://www.techradar.com/computing/artificial-intelligence/i-tried-the-most-r ealistic-ai-voice-companion-ever-created-if-chatgpt-or-gemini-ever-gets-this-g ood-reality-is-in-trouble --- Mystic BBS v1.12 A47 (Linux/64) * Origin: tqwNet Technology News (1337:1/100) .