Voice and Vision Unleashed: ChatGPT’s Exciting New Capabilities

ChatGPT can now see, hear, and speak

In the fast-paced world of technology, ChatGPT is taking a giant leap forward by introducing new voice and image capabilities. These exciting features promise a more intuitive and interactive experience, allowing users to engage in voice conversations and share images with ChatGPT. In this blog post, we’ll delve into how these capabilities are set to revolutionize various aspects of our lives.

Voice Conversations: Chat with Your Assistant

Imagine having a back-and-forth conversation with your AI assistant, ChatGPT, using just your voice. With this new feature, you can:

  • Chat on the go, whether you’re commuting or taking a stroll.
  • Request a bedtime story for your family.
  • Settle those dinner table debates with a quick voice query.

To get started with voice conversations, follow these steps:

  1. Go to Settings → New Features on the mobile app.
  2. Opt into voice conversations.
  3. Tap the headphone icon on the top-right corner of the home screen.
  4. Choose your preferred voice from a selection of five different options.

The voice capability is powered by a state-of-the-art text-to-speech model, creating human-like audio from text input and a few seconds of sample speech. Each voice is meticulously crafted in collaboration with professional voice actors, ensuring a natural and engaging conversational experience.

Image Sharing: A Picture is Worth a Thousand Words

ChatGPT now understands images, allowing you to share one or more pictures to initiate conversations. Here’s how you can make the most of this feature:

  • Troubleshoot problems like why your grill won’t start.
  • Plan your meals by exploring the contents of your fridge.
  • Analyze complex graphs and data for work-related tasks.

To use image sharing, simply tap the photo button to capture or select an image. If you’re using iOS or Android, tap the plus button first. You can also discuss multiple images or use the drawing tool in our mobile app to guide your assistant’s attention.

This image understanding capability is powered by the advanced GPT-3.5 and GPT-4 models, which apply their language reasoning skills to a wide range of images, including photographs, screenshots, and documents containing text and images.

A Gradual Rollout for a Brighter Future

OpenAI’s commitment is to build Artificial General Intelligence (AGI) that is safe and beneficial. The rollout of these new capabilities aligns with this mission, allowing for gradual improvements and risk mitigation while preparing for more powerful AI systems in the future.

Voice: The new voice technology, which can create realistic synthetic voices from brief real speech samples, opens up a world of creative and accessibility-focused applications. However, it also presents new challenges, such as the potential for misuse. To address this, voice chat has been designed in collaboration with voice actors and partners like Spotify, ensuring responsible and safe usage.

Image Input: Vision-based models have their challenges, including potential misinterpretations and privacy concerns. To address these issues, extensive testing has been conducted with risk assessors and alpha testers, leading to a safer and more useful image understanding feature.

Using AI for a Better Tomorrow

These new capabilities in ChatGPT aim to assist you in your daily life. Whether it’s helping you troubleshoot issues, plan meals, or engage in meaningful conversations, AI is here to enhance your experiences. OpenAI is committed to transparency about model limitations and encourages responsible usage.

Stay tuned for more updates as AI continues to evolve and empower users in their day-to-day interactions.

Explore AI Chatbot: The Future of Business Communication

Shares: