Elevating Airbnb Listings: GPT-4o Mini vs. Anthropic's 3.5 Sonnet in Agentic RAG

Explore the performance of GPT-4o Mini vs. Anthropic's 3.5 Sonnet in creating agents for Airbnb listing descriptions. Learn how to set up a practical dataset, embeddings, and an agentic RAG pipeline using Llama Index and VectorDB. Discover the strengths and limitations of each model in this comprehensive comparison.

December 22, 2024

Discover how the latest GPT-4o Mini model from OpenAI performs in creating agents for Agentic RAG, a powerful technique for information retrieval. This blog post provides a practical comparison with the Anthropic Cloud 3.5 Sonnet model, using a real-world Airbnb dataset to showcase the capabilities and limitations of each model in an agentic workflow.

Introducing GPT-4o Mini: A Cost-Effective Model for Agentic RAG
Setting Up the Environment and Data
Embedding the Airbnb Dataset
Creating the Vector Store and Query Engine Tool
Implementing the Agent Worker
Comparing GPT-4o Mini and Cloud 3.5 Sonnet as Agents
Conclusion

Introducing GPT-4o Mini: A Cost-Effective Model for Agentic RAG

GPT-4o Mini is the latest and most cost-effective model from OpenAI, and it's one of the best performing models in its price range. However, the question remains: how good is this model for creating agents, specifically agentic Retrieval-Augmented Generation (RAG)?

In this section, we will explore the performance of GPT-4o Mini in comparison to the Anthropic Cloud 3.5 Sonnet model, which is a popular choice for agentic RAG. We will use a practical dataset, the Airbnb embeddings data set from MongoDB, to test the capabilities of these models.

The data set contains various metadata columns, and we will preprocess the data to create a suitable input for the RAG pipeline. We will use Llama Index to set up the RAG implementation, leveraging its agentic capabilities.

For the LLM or agent, we will use both GPT-4o Mini and Cloud 3.5 Sonnet, and for the VectorDB, we will rely on ChromaDB. We will walk through the setup process, including installing the required libraries, setting up environment variables, and configuring the LLM and embedding models.

After setting up the environment, we will dive into the data loading and processing steps, where we will create the necessary data structures for the RAG pipeline. This includes converting the data into a pandas DataFrame, dropping the existing text embeddings, and creating a metadata template to be used in the embedding process.

Finally, we will set up the vector store using ChromaDB and define the query engine tool that will be available to the agent. We will then create the agent worker and interact with it using the chat function, comparing the performance of GPT-4o Mini and Cloud 3.5 Sonnet in agentic RAG tasks.

By the end of this section, you will have a better understanding of the capabilities of GPT-4o Mini in the context of agentic RAG and how it compares to the more powerful Cloud 3.5 Sonnet model.

Setting Up the Environment and Data

To get started, we first need to set up the required libraries and environment variables. We'll install the necessary packages, including Llama Index, OpenAI, and ChromaDB.

Next, we'll configure the environment variables, including the OpenAI API key and the Hugging Face token (if needed).

We'll then set up the LLM (Language Model) and embedding models. For the LLM, we'll be using the GPT-4 OM Mini model. For the embeddings, we'll use the OpenAI TextEmbedding3 small model, which allows us to customize the embedding size to reduce computation and storage costs.

After setting up the models, we'll move on to loading and processing the data. We'll be using the Airbnb embeddings dataset from MongoDB, focusing on the initial 2,000 data points to keep the processing time and costs manageable.

We'll drop the existing text embeddings and create our own embedding vectors using the OpenAI embeddings. We'll also extract relevant metadata from the dataset, such as the listing name, summary, house rules, property type, room type, and number of bedrooms and beds. This metadata will be used to enrich the text that the LLM will see during the retrieval process.

To prepare the data for the vector store, we'll split the text into chunks of 5,000 characters and create nodes that contain the text, embedding vectors, and metadata. Finally, we'll set up the ChromaDB vector store to store the nodes, which will be used as the knowledge base for the agentic RAG (Retrieval-Augmented Generation) workflow.

Embedding the Airbnb Dataset

To embed the Airbnb dataset, we first convert the dataset into a list of JSON documents. We then create a metadata template that includes important information such as the name of the Airbnb listing, summary, house rules, property type, room type, bedroom type, number of bedrooms, and number of beds. This metadata is added to the text that will be embedded.

Next, we split the text into chunks of 5,000 characters to ensure that each embedding vector can capture the relevant information. We then compute the embeddings using the OpenAI TextEmbedding3 small model, which allows us to customize the embedding size to reduce computation and storage costs.

After computing the embeddings, we store them in a ChromaDB vector store, which will serve as the knowledge base for our agent. We create a QueryEngineToolClass that will provide the agent with access to the vector store, allowing it to retrieve the most relevant chunks of text based on the user's query.

By preprocessing the data, creating a metadata template, and setting up the vector store, we have prepared the Airbnb dataset for use with the Llama Index agent. This process ensures that the agent has access to the necessary information to provide accurate and informative responses to user queries.

Creating the Vector Store and Query Engine Tool

To get started, we first need to set up the required libraries and environment variables. We'll install the necessary packages, including Llama Index, ChromaDB, and the OpenAI models.

Next, we'll set up our LLM and embedding models. For the LLM, we'll be using the GPT-4 OM Mini model, and for the embeddings, we'll use the OpenAI TextEmbedding3 small model.

We'll then load and preprocess the Airbnb dataset, dropping the existing text embeddings and creating our own metadata-enriched text chunks. These chunks will be embedded and stored in a ChromaDB vector store.

To create the query engine tool, we'll use the Llama Index QueryEngine tool class, which will provide access to the vector store and allow us to retrieve the top-k most similar chunks for a given query. We'll define this tool as part of the agent's knowledge base.

Finally, we'll create the agent worker, which combines the LLM and the query engine tool, enabling us to interact with the agent and retrieve the best Airbnb listings for a given location.

The key steps in this process are:

Set up the required libraries and environment variables.
Configure the LLM and embedding models.
Load and preprocess the Airbnb dataset.
Create the vector store using ChromaDB.
Define the query engine tool and add it to the agent's knowledge base.
Create the agent worker by combining the LLM and the query engine tool.

With these steps, we've set up the necessary infrastructure to use the GPT-4 OM Mini model for agentic RAG tasks on the Airbnb dataset.

Implementing the Agent Worker

To create the agent worker, we first define the tools that will be available to the agent. In this case, we use the QueryEngineToolV2 class from Llama Index, which provides access to the vector store we created earlier.

1query_engine_tool = QueryEngineToolV2(
2    "Knowledge base",
3    "Provides information about Airbnb listings and reviews, use a detailed plain text question as input to the tool.",
4    self.vector_store
5)
6tools = [query_engine_tool]

Next, we create the agent worker using the FunctionCallingAgentWorker class from Llama Index. We provide the list of tools and the language model (GPT-4 Mini in this case) to the agent worker.

1agent_worker = FunctionCallingAgentWorker(
2    tools,
3    self.llm,
4    verbose=True
5)
6self.agent = agent_worker

Now, we can use the chat function of the agent to interact with it. We can provide the agent with a prompt, and it will use the tools to generate a response.

1prompt = "Tell me the best listing for a place in New York."
2result = self.agent.chat(prompt)
3print(result)

The agent's response will include the thought process and the final answer. In this case, the GPT-4 Mini agent's response is not as detailed or insightful as the response from the Anthropic 3.5 Sonnet agent.

To compare the performance, we can also try a different prompt, such as "What is the worst Airbnb listing in Miami?". The Anthropic 3.5 Sonnet agent provides a more thoughtful and nuanced response, acknowledging the limitations of the knowledge base and providing general insights about the differences between vacation rentals in New York and Miami.

Overall, the implementation of the agent worker using Llama Index is straightforward, but the performance of the agent depends on the capabilities of the underlying language model. The Anthropic 3.5 Sonnet model appears to be better suited for agentic workflows compared to the GPT-4 Mini model.

Comparing GPT-4o Mini and Cloud 3.5 Sonnet as Agents

In this section, we compare the performance of GPT-4o Mini and Cloud 3.5 Sonnet as agents in a practical agentic RAG (Retrieval Augmented Generation) workflow using the Airbnb embeddings dataset from MongoDB.

The key findings are:

GPT-4o Mini as an Agent: While GPT-4o Mini is a capable model, it struggles with the agentic workflow. The model's thought process is not well-articulated, and the responses lack the level of detail and accuracy expected from an effective agent.
Cloud 3.5 Sonnet as an Agent: In contrast, Cloud 3.5 Sonnet demonstrates superior performance as an agent. It rewrites the prompts effectively, uses the knowledge base tool to gather relevant information, and provides detailed and accurate responses, even when the knowledge base does not have specific information about the requested topic.
Importance of Powerful LLMs for Agentic Workflows: The comparison highlights the importance of using a more powerful and capable LLM, such as Cloud 3.5 Sonnet, for agentic workflows. The agent's ability to understand the context, rewrite prompts, and generate high-quality responses is crucial for effective task completion and user interaction.

In summary, while GPT-4o Mini is a cost-effective model, it may not be the best choice for agentic workflows that require a more sophisticated and articulate agent. Cloud 3.5 Sonnet, on the other hand, demonstrates superior performance in this use case, showcasing the benefits of using a more capable LLM for agent-based applications.

Conclusion

The comparison between GPT-4o Mini and Anthropic's Cloud 3.5 Sonnet for agentic RAG (Retrieval Augmented Generation) tasks highlights the importance of model capability when it comes to agent-based workflows. While GPT-4o Mini is a capable and cost-effective model, it falls short in the agentic aspects of the task, as demonstrated by its simplistic prompt rewriting and less detailed responses.

In contrast, the Anthropic Cloud 3.5 Sonnet model showcases a more robust and sophisticated agentic behavior. It effectively rewrites the prompts, gathers relevant information from the knowledge base, and provides detailed and insightful responses, even when faced with a query about a location not present in the dataset.

This comparison underscores the need to carefully consider the model's capabilities when designing agent-based workflows. While GPT-4o Mini may be suitable for certain tasks, more complex agentic applications may require the use of a more powerful and specialized model, such as the Anthropic Cloud 3.5 Sonnet, to achieve the desired level of performance and user experience.

FAQ

What is GPT-4o Mini?

How does GPT-4o Mini perform for creating agents, specifically agentic RAG?

What data set is used in the video?

What tools are used to set up the agentic RAG pipeline?

What are the key steps involved in setting up the agentic RAG pipeline?

How does the performance of GPT-4o Mini and Cloud 3.5 Sonnet compare as agents?