AI News: Exploring OpenAI's Progress Towards AGI and Latest Developments

Explore OpenAI's roadmap for AGI development, from chatbots to AI organizations. Learn about their new Strawberry reasoning technology and the latest news and controversies around OpenAI. Discover how AI is transforming education, video creation, and more. Stay up-to-date on the fast-paced world of AI news and developments.

January 12, 2025

Discover the latest advancements in artificial intelligence, from OpenAI's progress towards AGI to the release of new AI models and tools that can revolutionize your work. This blog post provides a comprehensive overview of the most significant AI news and developments you need to know.

Open AI's Five Levels Towards AGI
Strawberry: Open AI's New Reasoning Technology
Concerns About Open AI's Practices
Potential Updates to Dolly Image Model
New Demos from Sora
HubSpot's Resources for Using ChatGPT at Work
Andre Karpathy's New Education Company Eureka Labs
Anthropic Releases CLA on Android
Google's New AI Features: Gemini Answers, Google Vids, and YouTube Music Sound Search
Controversy Over AI Training Data and Swiped YouTube Videos
Microsoft Designer Integration in Apps
Mistol Releases Cod Stroll Mamba Language Model
Amazon Launches Rufus AI Shopping Assistant
Meta Limiting Multimodal Models in the EU
Controlling Stable Diffusion with a MIDI Device
AI App Turns Selfies Into 3D Printable Characters
AI Accurately Determines Sex from Dental X-Rays
Open AI Releases GPT-4 Mini
Nvidia and Mistol Collaborate on Mistol Nemo Model
Google AI Featured in 2024 Olympics

Open AI's Five Levels Towards AGI

Open AI has outlined five levels of progress towards Artificial General Intelligence (AGI):

Chat Bots and AI with Conversational Language: This level represents the current state of chatbots and language models like ChatGPT, Claude, and LLaMA.
Reasoners that can do Human-Level Problem Solving: Open AI claims they are very close to achieving this level, which involves systems that can reason and solve problems at a human level.
Agents or Systems that can Take Actions on our Behalf: This level includes AI agents that can perform tasks like booking flights, responding to emails, and other actions on our behalf.
Innovators AI that can Aid in Invention: AI systems at this level can create novel ideas and assist in the invention process.
Organizations and AI that can do the Work of an Organization: The final level involves AI systems that can perform the work of an entire organization.

Open AI believes they are currently at the precipice of level two, and they expect to progress through these levels on the path towards true AGI.

Strawberry: Open AI's New Reasoning Technology

Open AI has been working on a new reasoning technology codenamed "Strawberry". According to a leaked internal document, teams inside Open AI are developing this project with the aim of creating a model that can not just generate answers to queries, but also plan ahead and navigate the internet autonomously to perform "deep research".

The key capabilities that Open AI is targeting with Strawberry include:

Performing long-horizon tasks and complex problems that require planning and a series of actions over an extended period of time.
Using these capabilities to conduct research by browsing the web autonomously, with the assistance of a computer agent that can take actions based on its findings.

While details on the precise state of Strawberry are limited, it seems to be a significant step towards Open AI's goal of progressing through the five levels of AI capability they outlined, moving closer to true Artificial General Intelligence (AGI). The project was reportedly previously known as "QAR", though this has not been officially confirmed.

Open AI has been characteristically tight-lipped about their upcoming models, so more information on Strawberry is likely to emerge only closer to its public release.

Concerns About Open AI's Practices

Open AI has faced scrutiny over its business practices, with whistleblowers claiming the company illegally keeps employees from talking to government regulators about problems at work and removes their rights to rewards for blowing the whistle.

A letter sent to the SEC chair alleges that Open AI has a policy that prohibits employees from making protected disclosures. This is not the first time Open AI's policies and contracts have come under scrutiny, as the company was previously found to be forcing people to sign non-disparagement agreements that could result in the loss of vested equity if they spoke badly about the company.

While Open AI refutes these claims, stating that they have a policy protecting employee whistleblower rights, the reports suggest the company may be overhauling its contracts due to the increased public scrutiny as it has grown in size and prominence. These allegations raise concerns about Open AI's treatment of its employees and transparency around potential issues within the organization.

Potential Updates to Dolly Image Model

There is speculation that the Dolly image model from OpenAI may have recently received an update. This is based on observations that the model's ability to generate text in images has improved, with the text appearing more legible than before.

Specifically, a post from the user "angry penguin" on X (formerly Twitter) shows an image created by Dolly that clearly displays the text "evolve" - a significant improvement compared to the model's previous struggles with generating readable text.

Additionally, it's noted that if you prompt Dolly to create an image of a "robot holding a sign that says Please Subscribe," the resulting image now has the text appearing much more clearly than it did previously.

These observations suggest that OpenAI may have made updates to the Dolly model, enhancing its text generation capabilities. However, the extent and details of any potential updates are not officially confirmed by the company.

It's also worth mentioning that if you don't have access to a ChatGPT Plus account, you can use the Dolly 3 model for free on the Bing Image Creator website. This may be a convenient option for those who want to experiment with the Dolly model without a paid subscription.

New Demos from Sora

We've been getting more demo videos from Sora, which is just making people more anxious to actually get their hands on it. However, we do have some tools like Runway Gen 3 and Luma's Dream Machine that can create pretty good AI-generated videos, which has dampened the excitement for Sora a little bit.

Despite this, the fact that Sora can create much longer videos and Open AI tends to set the bar for almost everything they put out, I'm still excited about it. The new demo videos shared on Matthew Berman's X account look pretty impressive, showcasing black and white clips and ocean crashing scenes. While we may have some tools to scratch that itch for now, the potential of Sora's capabilities is still highly anticipated.

HubSpot's Resources for Using ChatGPT at Work

If you're somebody that uses AI at work or you're thinking about using AI at work, you need to check out HubSpot's completely free bundle called "Five Essential Resources for Using ChatGPT at Work".

This resource includes:

Interesting flowcharts on when you should or shouldn't use ChatGPT
A template to ensure ChatGPT-generated content follows your brand's voice
An AI-generated content refinement checklist
A comprehensive checklist for adopting AI in the workplace
A guide on how to supercharge your day with ChatGPT, including 100 ways to try it today

The link to this totally free resource from HubSpot is in the description below. If you haven't embraced AI yet, remember what Nvidia CEO Jensen Huang said - AI will be the most transformative technology of the 21st century and will affect every industry and aspect of our lives. So if you're not using AI to speed up and improve the quality of your work, your competitors probably are.

Andre Karpathy's New Education Company Eureka Labs

Andre Karpathy, who previously worked at OpenAI and then recently stepped away, just announced a new venture that he's working on. He said he's "excited to share that I'm starting an AI plus education company called Eureka Labs."

At Eureka Labs, they're building a new kind of school that is "AI native." They say that subject matter experts who are deeply passionate, great at teaching, infinitely patient, and fluent in all of the world's languages are also very scarce and cannot personally tutor all 8 billion of us on demand.

The announcement suggests that Eureka Labs is creating an online education platform where the teacher still designs the course materials, but they are supported, leveraged, and scaled with an AI teaching assistant. This AI assistant is optimized to help guide the students through the course content.

While the details are limited, the concept seems to be that the teacher's expertise and teaching materials are trained into the AI model, allowing the AI to provide personalized tutoring and support to students at scale, in any language. This could massively expand the reach and impact of individual subject matter experts.

Anthropic Releases CLA on Android

If you're a fan of Anthropic's CLA and you don't have an iPhone, well, good news - they just released it on Android. It's been available on iOS for a couple of months now, and they've just rolled out the Android version.

Personally, I'm still a bit more of a fan of the ChatGPT app, mainly because the conversational voice portion of the ChatGPT app is really impressive. When I'm on my computer, I usually use either CLA or Perplexity. But when I'm on my phone, I still tend to go with the ChatGPT app.

That said, I understand that most people probably don't want to pay for separate chat subscriptions. So if you really like the ability to have a voice conversation with an AI, the ChatGPT app is still the way to go. But if you don't care about that and you just want the best model in your hand, CLA is probably the better option. And now they have an Android app as well.

Google's New AI Features: Gemini Answers, Google Vids, and YouTube Music Sound Search

Gemini, Google's AI assistant, now answers general questions when your Android phone is locked. This feature allows you to quickly get information without having to unlock your device.

Google has also announced Google Vids, an AI-powered video creation app designed for work and deeply integrated with the Google Workspace suite. Google Vids allows you to create slide-style videos by providing a prompt, selecting a style, and adding a voice-over and stock footage.

Additionally, YouTube is rolling out a new feature called YouTube Music Sound Search. This feature allows you to hum or sing a song, and YouTube will identify the track. It works similarly to Shazam, but is integrated directly into the YouTube platform.

These new AI-powered features from Google aim to enhance productivity, creativity, and user experience across various Google services and platforms.

Controversy Over AI Training Data and Swiped YouTube Videos

There has been some controversy this week about the source of training data for various AI models. An article on Proof News claims that Apple, Nvidia, and Anthropic have used thousands of swiped YouTube videos to train their AI models.

The issue stems from a company called Uther AI, which is an open-source project that collects a large dataset called "the Pile" from publicly available data. It turns out that a significant portion of this data was transcripts copied directly from YouTube videos.

Many YouTubers, including popular creators like MKBHD, Mr. Beast, and PewDiePie, have noticed their content being used in this dataset. Proof News even created a search engine to allow people to check if their videos were included.

After the controversy came to light, Apple acknowledged using the Pile for some research purposes, but stated that the model used in Apple Intelligence is not trained on this data. Microsoft's Designer platform, which uses AI to create images, has also been implicated in using swiped YouTube content.

The use of copyrighted material without permission for training AI models has raised concerns about the ethical and legal implications of these practices. As AI technology continues to advance, the issue of data provenance and the rights of content creators will likely remain an ongoing challenge for the industry.

Microsoft Designer Integration in Apps

Microsoft has rolled out its platform called Designer, which is very similar to Canva. It's a platform for creating various content like YouTube thumbnails, banner ads, Instagram images, and more. This Designer platform is now being integrated into a wide range of Microsoft apps.

The key features of this integration include:

Co-Pilot Sidebar: Users can access the Co-Pilot sidebar within Microsoft apps to create specific images in a desired style.
Image Generation: The Designer platform can generate images based on user prompts, which can then be directly pulled into the user's document, PowerPoint, or other Microsoft tool.
Mobile App: Microsoft has also released free Designer mobile apps for both iOS and Android, allowing users to easily create and edit images on the go.

Other notable features of the Designer platform update include:

Restyle Feature: Users can upload an image and the platform will restyle it into a different style.
Expanded Capabilities: The Designer platform has gained a number of new features and capabilities to enhance the image creation and editing experience.

Overall, the integration of Microsoft Designer into various Microsoft apps provides users with a powerful AI-powered tool for generating and manipulating images directly within the apps they use every day.

Mistol Releases Cod Stroll Mamba Language Model

Mistol, the French AI company that develops large language models, has released a new model called Cod Stroll Mamba. This model is designed specifically for code generation and is open-source.

The key features of Cod Stroll Mamba include:

7 billion parameter model
Handles input of up to 256,000 tokens (roughly 192,000 words)
Offers fast response times even with longer input text
Open-source and available for developers to use

This new model from Mistol is aimed at providing a powerful code generation tool for developers. With its large context window and efficient performance, Cod Stroll Mamba could be a valuable addition to the AI-powered coding tools available.

Developers interested in trying out this new language model can access it through Mistol's open-source release.

Amazon Launches Rufus AI Shopping Assistant

Amazon has rolled out a new AI-powered shopping assistant called Rufus. Rufus is a chatbot built directly into the Amazon app that can answer questions about shopping and even politics.

Rufus is trained on Amazon's data, allowing it to provide recommendations and information about products available on the platform. Users can ask Rufus questions like "What are the best lawn games for a kids' birthday party?" and it will suggest relevant products and where to find them on Amazon.

The assistant can also answer questions on topics beyond just shopping, such as information about political candidates for the 2024 election. This demonstrates Rufus' broader capabilities as a general-purpose chatbot, similar to ChatGPT.

The launch of Rufus is Amazon's latest effort to integrate AI technology into its shopping experience, aiming to provide customers with a more intelligent and personalized assistant for their purchasing decisions.

Meta Limiting Multimodal Models in the EU

It sounds like Meta is not going to be offering their multimodal models in the European Union. They will be releasing a multimodal llama model in the coming months, but it will not be available to EU users.

The main reason cited is the "unpredictable nature of the European regulatory environment." Specifically, Meta's issue is not with the still-finalized AI Act, but rather with how they can train models using data from European customers while complying with GDPR, the EU's existing data protection law.

In contrast, the UK has nearly identical laws to GDPR, but Meta says they are not seeing the same level of regulatory uncertainty and plan to launch the new model for UK users.

Controlling Stable Diffusion with a MIDI Device

1#
2
3This section discusses an interesting project shared on X by user johanis_stelzer, where they have connected a MIDI device to their computer to control various aspects of Stable Diffusion.
4
5The user has hooked up a MIDI device to their computer and is using the knobs on the device to change different parameters within Stable Diffusion. This allows them to dynamically adjust elements of the generated images in real-time, providing a unique and interactive way to explore the capabilities of the text-to-image model.
6
7The user has also shared the code for this project on GitHub, allowing others to replicate the setup and experiment with controlling Stable Diffusion through a MIDI interface. This integration of physical hardware with AI-powered image generation opens up new possibilities for artists and creators to explore the creative potential of these technologies.
8
9Overall, this project showcases the ingenuity and experimentation happening in the AI community, as developers and enthusiasts continue to push the boundaries of what is possible with these powerful tools.

AI App Turns Selfies Into 3D Printable Characters

According to the article, a new AI-powered app from 10 Cents allows users to turn their selfies into 3D printable characters. The app can generate a 3D model based on a single selfie that is detailed enough to be 3D printed.

The article states that this is a fascinating application of AI technology, as it enables the creation of personalized 3D characters from just a simple selfie. While details on the app are limited, the ability to transform a 2D image into a 3D printable model showcases the impressive capabilities of AI-powered image generation.

The author notes that they are interested in trying out the app themselves, given their interest in both 3D printing and AI. They plan to explore the app further as more information becomes available, to see the quality of the 3D models it can produce from user selfies.

AI Accurately Determines Sex from Dental X-Rays

AI systems have achieved a 96% accuracy in determining the sex of individuals from their dental X-rays. Researchers trained an AI model on a large dataset of dental images and found that it could accurately identify the sex of the individuals, with the exception of children under the age of 6 who had not yet lost their baby teeth.

The primary use case for this technology is in forensics, where it can help identify the sex of skeletal remains based on dental evidence. The high accuracy rate of the AI system makes it a valuable tool for investigators and anthropologists working to determine the identity of unknown individuals.

While the technology is still in its early stages, the findings demonstrate the potential for AI to revolutionize various fields, including forensics and medical diagnostics. As the technology continues to evolve, it is likely that we will see even more impressive applications of AI in the years to come.

Open AI Releases GPT-4 Mini

Open AI has just launched a new model called GPT-4 Mini, which is a smaller and more cost-efficient version of their powerful GPT-4 language model.

The key details about GPT-4 Mini:

It is replacing the older GPT-3.5 model, providing faster and smarter capabilities.
It supports text and vision inputs/outputs, with plans to add support for video and audio in the future.
It has a 128,000 token context window, allowing for large amounts of input text.
The output is limited to 16,000 tokens.
In model evaluation benchmarks, GPT-4 Mini performs second best overall, behind the full GPT-4 model.
The new model is available to Chat GPT Plus and team users starting today.
The free version of Chat GPT is still using the older GPT-3.5 model for now.

This release of a smaller, more efficient version of GPT-4 allows Open AI to better compete with other large language models from companies like Nvidia and Anthropic. It provides a powerful yet cost-effective option for businesses and developers who want to leverage advanced language AI capabilities.

Nvidia and Mistol Collaborate on Mistol Nemo Model

Nvidia and Mistol, the French AI company, have teamed up to create Mistol Nemo, a 12 billion parameter large language model. This model is designed to be efficient and capable of local deployment, making it attractive for businesses with limited internet connectivity or strict data privacy requirements.

The Mistol Nemo model has a context window of 128,000 tokens, similar to the newly released GPT-4 Mini from OpenAI. This large context window allows the model to handle extensive input and output text.

While the model is more suited for laptops and desktop PCs rather than smartphones, it offers the capability to run a powerful large language model locally, without the need for constant internet access. This feature could be particularly appealing for companies with concerns about data privacy or operating in environments with limited connectivity.

The Mistol Nemo model is immediately available, and a downloadable version is promised in the near future. Businesses interested in leveraging a high-performance, locally deployable large language model can explore this collaboration between Nvidia and Mistol.

Google AI Featured in 2024 Olympics

Google is the official AI sponsor for Team USA at the 2024 Summer Olympics. The tech giant plans to have advertisements featuring its various AI products across the Olympic broadcasts. Viewers can expect to see a heavy presence of Google's AI technology during the coverage of the games.

FAQ

What are the five levels of progress towards AGI according to OpenAI?

What is the new reasoning technology codenamed 'Strawberry' that OpenAI is working on?

What are the concerns raised about OpenAI's policies and contracts with employees?

What is the new 'YouTube Music Sound Search' feature?

What is the controversy around the training data used by some AI models?