Skip to main content

All the wild things people are doing with ChatGPT’s new Voice Mode

Nothing Phone 2a and ChatGPT voice mode.
Nadeem Sarwar / Digital Trends

ChatGPT‘s Advanced Voice Mode arrived on Tuesday for a select few OpenAI subscribers chosen to be part of the highly anticipated feature’s alpha release.

The feature was first announced back in May. It is designed to do away with the conventional text-based context window and instead converse using natural, spoken words, delivered in a lifelike manner. It works in a variety of regional accents and languages. According to OpenAI, Advanced Voice, “offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions.

There are some limitations to what users can ask Voice Mode to do. The system will speak in one of four preset voices and is not capable of impersonating other people’s voices — either individuals or public figures.

In fact, the feature will outright block outputs that differ from the four presets. What’s more, the system will not generate copyrighted audio or generate music. So of course, the first thing someone did was to have it beatbox.

Advanced Voice as a B-boy

Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS

— Ethan Sutin (@EthanSutin) July 30, 2024

Alpha user Ethan Sutin posted a thread to X (formerly Twitter) showing a number of Advanced Voice’s responses, including the one above where the AI reels off a short “birthday rap” and then proceeds to beatbox. You can actually hear the AI digitally breathe in between beats.

Advanced Voice as a storyteller

This is awesome actually

I did not expect the ominous sounds https://t.co/SgEPi5Bd3K pic.twitter.com/DnK8AVdWjV

— Kesku (@yoimnotkesku) July 30, 2024

While Advanced Voice is prohibited from creating songs wholesale, it can generate background sound effects for the bedtime stories it recites.

In the example above from Kesku, the AI adds well-timed crashes and slams to its tale of rogue cyborg after being asked to, “Tell me an exciting action thriller story with sci-fi elements and create atmosphere by making appropriate noises of the things happening (e.g: A storm howling loudly)”.

look on OpenAI’s works ye mighty and despair!

this is most wild one. You can really feel like a director guiding a Shakespearean actor! pic.twitter.com/GUQ1z8rjIL

— Ethan Sutin (@EthanSutin) July 31, 2024

The AI is also capable of creating realistic characters on the spot, as Sutin’s example above demonstrates.

Advanced Voice as an emotive speaker

Khan!!!!!! pic.twitter.com/xQ8NdEojSX

— Ethan Sutin (@EthanSutin) July 30, 2024

The new feature sounds so lifelike in part because it is capable of emoting as a human would. In the example above, Ethan Sutin recreates the famous Star Trek II scene. In the two examples below, user Cristiano Giardina compels the AI to speak in different tones and different languages.

ChatGPT Advanced Voice Mode speaking Japanese (excitedly) pic.twitter.com/YDL2olQSN8

— Cristiano Giardina (@CrisGiardina) July 31, 2024

ChatGPT Advanced Voice Mode speaking Armenian (regular, excited, angry) pic.twitter.com/SKm73lExdX

— Cristiano Giardina (@CrisGiardina) July 31, 2024

Advanced Voice as an animal lover

🐈 pic.twitter.com/UZ0odgaJ7W

— Ethan Sutin (@EthanSutin) July 30, 2024

The AI’s vocal talents don’t stop at humans languages. In the example above, Advanced Voice is told to make cat sounds, and does so with unerring accuracy.

Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To

— Manuel Sainsily (@ManuVision) July 30, 2024

In addition to sounding like a cat, users can pepper the AI with questions about their biological feline friends and receive personalized tips and advice in real time.

Advanced Voice as a real-time translator

Real-Time Japanese translation using #ChatGPT’s new advanced voice mode + vision alpha! Yet another useful example! pic.twitter.com/wDXrgYQkZE

— Manuel Sainsily (@ManuVision) July 31, 2024

Advanced Voice can also leverage you device’s camera to aid in its translation efforts. In the example above, user Manuel Sainsily points his phone at a GameBoy Advanced running a Japanese-language version of a Pokémon game, and has the AI read the onscreen dialog as he plays.

The company notes that video and screen sharing won’t be part of the alpha release but will be available at a later date. OpenAI plans to expand the alpha release to additional Plus subscribers “over the next few weeks” and will bring it to all Plus users “in the fall.”

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
GPT-4: everything you need to know about ChatGPT’s standard AI model
A laptop opened to the ChatGPT website.

People were in awe when ChatGPT came out, impressed by its natural language abilities as an AI chatbot originally powered by the GPT-3.5 large language model. But when the highly anticipated GPT-4 large language model came out, it blew the lid off what we thought was possible with AI, with some calling it the early glimpses of AGI (artificial general intelligence).
What is GPT-4?
GPT-4 is the newest language model created by OpenAI that can generate text that is similar to human speech. It advances the technology used by ChatGPT, which was previously based on GPT-3.5 but has since been updated. GPT is the acronym for Generative Pre-trained Transformer, a deep learning technology that uses artificial neural networks to write like a human.

According to OpenAI, this next-generation language model is more advanced than ChatGPT in three key areas: creativity, visual input, and longer context. In terms of creativity, OpenAI says GPT-4 is much better at both creating and collaborating with users on creative projects. Examples of these include music, screenplays, technical writing, and even "learning a user's writing style."

Read more
OpenAI just took the shackles off the free version of ChatGPT
ChatGPT results on an iPhone.

OpenAI announced the release of its newest snack-sized generative model, dubbed GPT-4o mini, which is both less resource intensive and cheaper to operate than its standard GPT-4o model, allowing developers to integrate the AI technology into a far wider range of products.

It's a big upgrade for developers and apps, but it also expands the capabilities and reduces limitations on the free version of ChatGPT. GPT-4o mini is now available to users on the Free, Plus, and Team tiers through the ChatGPT web and app for users and developers starting today, while ChatGPT Enterprise subscribers will gain access next week. GPT-4o mini will replace the company's existing small model, GPT-3.5 Turbo, for end users beginning today.

Read more
The ChatGPT app has changed how I use my Mac in three key ways
The Option+Space shortcut of the macOS ChatGPT app.

After a long wait, OpenAI has launched the ChatGPT app on macOS for everyone to use. I’ve been playing around with it to see how it works and what it’s good at, and I’ve come away pretty impressed so far. It’s got all the power of ChatGPT in a handy desktop package. Better yet, you don’t need to pay to use it, as there’s no cost to download it, and it works with a free OpenAI account (free accounts do have limits placed on their usage, though, as they do on the web).

After seeing what I can get out of it, I’ve found there are three things I really love about the new ChatGPT Mac app. From the way it launches to its impressive capabilities, I think you’ll enjoy these aspects of the app as well.
It launches with a clever shortcut

Read more