26 June, 2024

OpenAI’s GPT-4o Brings ‘Samantha’ from the Movie Her to the Public

In the ever-evolving field of artificial intelligence, OpenAI has consistently led with innovative advancements, pushing the boundaries of AI capabilities in interaction and functionality. The launch of GPT-4o on May 13, 2024, stands as a significant milestone, building on the success of its predecessor, GPT-4 Turbo, introduced in November 2023. GPT-4o not only surpasses its predecessor in terms of speed and cost-efficiency—being twice as fast and half as expensive—but also introduces a suite of enhanced features set to redefine user interactions with AI. This latest version of ChatGPT, reminiscent of the hyper-intelligent AI assistant ‘Samantha’ from the movie Her, has sparked debates over its character-like capabilities and controversy regarding the unauthorized use of Scarlett Johansson’s voice.

Enhanced Speed and Accessibility

GPT-4o’s performance improvements are substantial. During its unveiling, it demonstrated the capability to process audio inputs and engage in real-time conversation through a transition from Text to Speech (TTS) to Speech to Speech (STS) technology. Remarkably, it can respond within 0.23 seconds, which is swifter than the average human response time of 0.32 seconds. This enhancement not only makes interactions smoother but also more natural, mirroring human conversational patterns more closely than ever before.

Broadened Linguistic and Cultural Horizons

Another notable advancement with GPT-4o is its improved handling of languages beyond English. The model effectively implements tokenization, which has significantly accelerated the processing of 20 non-English languages. This advancement is not just technical but also extends to cultural sensitivity and understanding, enabling GPT-4o to engage more profoundly with global users.

Real-Time Image Processing

GPT-4o showcases remarkable capabilities in real-time image processing. It has improved recognition of texts and objects within images, which includes a superior ability to recognize and interpret scripts from Korean to Cyrillic and Arabic calligraphy. This functionality is pivotal, for instance, in identifying the cause of malfunctions from photographs of damaged objects or determining the species of a plant, thereby enhancing the utility of AI in practical, everyday applications.

Deeper Integration into Daily Interactions

The improvements extend to more interactive and emotionally intelligent features. GPT-4o can conduct conversations that mimic real-life interactions, including video calls. It can interpret human expressions and emotional tones, tailoring its responses accordingly. This level of interaction is designed to foster a deeper connection between the AI and its users, making it a more helpful and empathetic companion.

Scarlett Johansson’s Voice Controversy with OpenAI

Scarlett Johansson recently expressed concerns that OpenAI used a voice eerily similar to hers for their new GPT-4o chatbot, named ‘Sky,’ without her consent. This revelation came after she declined an offer from OpenAI’s CEO, Sam Altman, who sought her voice to make AI more relatable to users. Despite declining an offer from OpenAI’s CEO, Sam Altman, OpenAI launched the chatbot, leading to public and personal confusion due to the similarity of the AI’s voice to Johansson’s. Notably, after facing backlash and legal pressure from Johansson, OpenAI announced that it would suspend the use of the ‘Sky’ voice, citing a misunderstanding in their intent, which they claimed was not to mimic Johansson’s voice. This incident has highlighted issues related to voice imitation technologies and their implications for privacy and consent, fueling ongoing debates about the ethics of AI development and the need for clearer regulations.