Solid breakdown of the voice AI landscape. The point about ambient computing needing voice as a natural interface cuts to the core of why this tech matters beyond just "AI can talk now." What strikes me most is how fast emotion detection is maturing (Hume EVI) - that's the real differeniator for conversational agents vs just stitching speech-to-text and text-to-speech together. The dubbing use case is underrated, especially for solo creators who couldn't afford localization befor.
Agreed: I think truly nailing the emotional cues and making live conversations sound natural (interruptions, pauses, etc.) is likely the final piece of the puzzle when it comes to getting more people to embrace voice as a viable interface. We're getting pretty close!
Great summary. I remember how bad voice-to-text was around 2010 and how useless I felt the tech was back then. Fast forward to today and I will frequently use Jippity Voice as a sounding board for thinking out loud, or for taking notes, or for creating an ultra-fast outline for something I want to write. It's silly useful for all those use cases, and prior to 2024 or so, I had a much harder time getting what was in my brain out and into the wider world. Voice helps so much.
Lately, Jippity will allow you to see the text and images on the screen while using Voice from your phone. I have gotten so used to just using it while I walk or wash dishes or whatever, that I haven't really taken advantage of this new form of computing yet.
Solid breakdown of the voice AI landscape. The point about ambient computing needing voice as a natural interface cuts to the core of why this tech matters beyond just "AI can talk now." What strikes me most is how fast emotion detection is maturing (Hume EVI) - that's the real differeniator for conversational agents vs just stitching speech-to-text and text-to-speech together. The dubbing use case is underrated, especially for solo creators who couldn't afford localization befor.
Agreed: I think truly nailing the emotional cues and making live conversations sound natural (interruptions, pauses, etc.) is likely the final piece of the puzzle when it comes to getting more people to embrace voice as a viable interface. We're getting pretty close!
Great summary. I remember how bad voice-to-text was around 2010 and how useless I felt the tech was back then. Fast forward to today and I will frequently use Jippity Voice as a sounding board for thinking out loud, or for taking notes, or for creating an ultra-fast outline for something I want to write. It's silly useful for all those use cases, and prior to 2024 or so, I had a much harder time getting what was in my brain out and into the wider world. Voice helps so much.
Lately, Jippity will allow you to see the text and images on the screen while using Voice from your phone. I have gotten so used to just using it while I walk or wash dishes or whatever, that I haven't really taken advantage of this new form of computing yet.