Let's Talk AI Artificial Intelligence,RAG OpenAI Assistants with File Search: Your Mini RAG, API-Ready

OpenAI Assistants with File Search: Your Mini RAG, API-Ready

Large Language Models are smart but they don’t know some specific knowledge you have about your company, project, prospect, … . Giving your LLM access to your library via RAG is one of the solutions, but takes a bit of setup and complexity. If you’ve played with “Projects” in ChatGPT before, you’ll know how powerful it is to define a system prompt and upload custom files—essentially building your own knowledge assistant aka mini RAG. Great news: this is also available in the API as well (in beta), making life simple and easy for a lot of developers.

I’ve been always charmed by simplicity and accessibility. And sometimes you don’t need MCP or automation tools to do something that can be done simple. In my opinion, I think this is where OpenAI is good at; sometimes keeping things simple.

 Continue reading below picture

Why You Should Care

Imagine you have domain-specific content (like internal documentation, product manuals, or client-specific data), and you want an LLM to use that knowledge—but without:

  • 🚫 Re-uploading or inlining all that data with every API call
  • 🚫 Setting up your own full Retrieval-Augmented Generation (RAG) pipeline

With this new API feature, OpenAI handles the hard part for you. You define an assistant once, upload files once, and then just ask questions. It’s simple, scalable, and saves on both time and costs.

Pre-requisites

Before jumping in, here’s what you need:

  1. A valid OpenAI API account
  2. Access to Projects system
  3. Credits or a payment method set up via your billing overview

How It Works: A Mini RAG System via API

You’ll use a combination of:

  • Assistants (defined in the OpenAI platform)
  • Vector stores (to hold your files)
  • file_search (OpenAI’s built-in retrieval tool)

This means:

  • Files are automatically chunked, embedded, and indexed
  • You can store up to 10,000 files per assistant
  • Advanced reranking, query rewriting, and multithreaded search is built-in
  • And yes, it comes at a cost: $0.10/GB/day, so monitor your usage!

Also cool: vector stores are shared across assistants and threads, making them reusable and great for scaling.

Setup Guide

1. Create a Project

Go to Projects, click “New Project”, and give it a name (e.g., Test). This helps organize and monitor usage.

📌 Make sure this project is selected in the top bar at all times during setup. Your assistant and vector store will be tied to the active project at creation time.

2. Create Your Vector Store

Go to StorageVector stores tab → “Create”.

  • Upload your documents (PDF, TXT, CSV, etc.) via the “Add files” button.
  • Once uploaded, they are automatically embedded and indexed.

You’ll now see file usage and associated cost in your dashboard. Expect a small fee, but much cheaper than resending full context on every call.

⚠️ Important: File storage isn’t free. You’re charged $0.10 / GB / day for active vector store usage.

3. Create Your Assistant

Go to Assistants and click “Create Assistant”:

  • Give it a name
  • Add your system instruction (prompt that defines the assistant’s tone and behavior)
  • Choose a model
  • Turn on File Search
  • Link the vector store you created earlier
  • Leave the other settings as default (unless you have specific needs)

You don’t need to set any file_ids—the assistant uses the linked vector store instead.

Test in Playground

  • After saving your assistant, click “Playground” to test it out.
  • Ask a natural language question that requires the uploaded content.
    For example, I uploaded internal docs about my consulting company, and asked:
  • “Welke diensten biedt Syville Consulting aan?”
  • The assistant correctly pulled the answer from the vector store. 🎯

Use It in the API

Once everything’s set up, you can call the assistant from Python.
Here’s a snippet from my full code example on GitHub:import openai

client = openai.OpenAI(
    api_key=OPENAI_api_key,
    project=OPENAI_project_id
)

# Check if assistant has file_search enabled
assistant = client.beta.assistants.retrieve(OPENAI_assistant_id)
print(assistant.tools)

# You should see something like:
# [FileSearchTool(type='file_search', ...)]
# Create a thread
thread = client.beta.threads.create()

# Add a message to the thread
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What types of services does Syville Consulting offer?"
)

# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=OPENAI_assistant_id
)

The assistant will now use the file_search tool and retrieve from your vector store without needing any extra parameters.

Tips and Watch-outs

  • You don’t need to attach file_ids when using file_search (which was the case in v1)
  • ✅ The assistant can only retrieve from the vector store you explicitly link
  • Ask naturally — retrieval is only triggered when relevant
  • ✅ You can now limit tokens, context size, and message history to control performance and cost

Final Thoughts

This system is making life simple for anyone building smart assistants without needing a full dev stack. Whether you’re a startup founder, developer, or product leader—if you have valuable documents and want to leverage them quickly with an LLM, this is for you.

And it’s only getting better.

Receive Latest Updates!

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post