Large Language Models are smart but they don’t know some specific knowledge you have about your company, project, prospect, … . Giving your LLM access to your library via RAG is one of the solutions, but takes a bit of setup and complexity. If you’ve played with “Projects” in ChatGPT before, you’ll know how powerful it is to define a system prompt and upload custom files—essentially building your own knowledge assistant aka mini RAG. Great news: this is also available in the API as well (in beta), making life simple and easy for a lot of developers.
I’ve been always charmed by simplicity and accessibility. And sometimes you don’t need MCP or automation tools to do something that can be done simple. In my opinion, I think this is where OpenAI is good at; sometimes keeping things simple.
Continue reading below picture
Why You Should Care
Imagine you have domain-specific content (like internal documentation, product manuals, or client-specific data), and you want an LLM to use that knowledge—but without:
- 🚫 Re-uploading or inlining all that data with every API call
- 🚫 Setting up your own full Retrieval-Augmented Generation (RAG) pipeline
With this new API feature, OpenAI handles the hard part for you. You define an assistant once, upload files once, and then just ask questions. It’s simple, scalable, and saves on both time and costs.
Pre-requisites
Before jumping in, here’s what you need:
- A valid OpenAI API account
- Access to Projects system
- Credits or a payment method set up via your billing overview
How It Works: A Mini RAG System via API
You’ll use a combination of:
- Assistants (defined in the OpenAI platform)
- Vector stores (to hold your files)
- file_search (OpenAI’s built-in retrieval tool)
This means:
- Files are automatically chunked, embedded, and indexed
- You can store up to 10,000 files per assistant
- Advanced reranking, query rewriting, and multithreaded search is built-in
- And yes, it comes at a cost: $0.10/GB/day, so monitor your usage!
Also cool: vector stores are shared across assistants and threads, making them reusable and great for scaling.
Setup Guide
1. Create a Project
Go to Projects, click “New Project”, and give it a name (e.g., Test). This helps organize and monitor usage.
📌 Make sure this project is selected in the top bar at all times during setup. Your assistant and vector store will be tied to the active project at creation time.
2. Create Your Vector Store
Go to Storage → Vector stores tab → “Create”.
- Upload your documents (PDF, TXT, CSV, etc.) via the “Add files” button.
- Once uploaded, they are automatically embedded and indexed.
You’ll now see file usage and associated cost in your dashboard. Expect a small fee, but much cheaper than resending full context on every call.
⚠️ Important: File storage isn’t free. You’re charged $0.10 / GB / day for active vector store usage.
3. Create Your Assistant
Go to Assistants and click “Create Assistant”:
- Give it a name
- Add your system instruction (prompt that defines the assistant’s tone and behavior)
- Choose a model
- Turn on File Search
- Link the vector store you created earlier
- Leave the other settings as default (unless you have specific needs)
You don’t need to set any file_ids—the assistant uses the linked vector store instead.
Test in Playground
- After saving your assistant, click “Playground” to test it out.
- Ask a natural language question that requires the uploaded content.
For example, I uploaded internal docs about my consulting company, and asked: - “Welke diensten biedt Syville Consulting aan?”
- The assistant correctly pulled the answer from the vector store. 🎯
Use It in the API
Once everything’s set up, you can call the assistant from Python.
Here’s a snippet from my full code example on GitHub:import openai
client = openai.OpenAI(
api_key=OPENAI_api_key,
project=OPENAI_project_id
)
# Check if assistant has file_search enabled
assistant = client.beta.assistants.retrieve(OPENAI_assistant_id)
print(assistant.tools)
# You should see something like:
# [FileSearchTool(type='file_search', ...)]
# Create a thread
thread = client.beta.threads.create()
# Add a message to the thread
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What types of services does Syville Consulting offer?"
)
# Run the assistant
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=OPENAI_assistant_id
)
The assistant will now use the file_search tool and retrieve from your vector store without needing any extra parameters.
Tips and Watch-outs
- ✅ You don’t need to attach
file_idswhen usingfile_search (which was the case in v1) - ✅ The assistant can only retrieve from the vector store you explicitly link
- ✅ Ask naturally — retrieval is only triggered when relevant
- ✅ You can now limit tokens, context size, and message history to control performance and cost
Final Thoughts
This system is making life simple for anyone building smart assistants without needing a full dev stack. Whether you’re a startup founder, developer, or product leader—if you have valuable documents and want to leverage them quickly with an LLM, this is for you.
And it’s only getting better.