✅ chatbot_api.py: A Complete Guide to Building Your Own AI Chatbot API (2025 Edition)

ds66

2024-12-26

🔍 Introduction

In the rapidly evolving field of conversational AI, creating your own custom chatbot API has become more feasible than ever—thanks to powerful open-source large language models like DeepSeek, flexible backends like FastAPI, and lightweight deployment tools like Ollama and llama.cpp.

This guide walks you through creating a Python-based chatbot API using chatbot_api.py, a clean, production-ready server built with FastAPI that integrates a local LLM (e.g., DeepSeek), supports chat history, streaming, multi-user sessions, and even plugin-style extensibility.

Whether you’re developing a customer support bot, developer assistant, or private enterprise agent, this guide will help you:

Understand how chatbot_api.py is structured
Deploy it with DeepSeek or another model
Extend its functionality
Secure it for production
Optimize its performance

✅ Table of Contents

Why Build Your Own Chatbot API?
Prerequisites and Tools
Directory Structure of chatbot_api.py
Full Code Walkthrough
Chat History and Sessions
Adding Streaming Support
Switching Between Models (DeepSeek, GPT, Claude)
Frontend Integration Tips
Authentication and Rate Limiting
Deployment Options (Docker, VPS, Serverless)
Testing and Debugging
Conclusion + Download the Template

1. 🤖 Why Build Your Own Chatbot API?

✅ Advantages:

Cost control: No token billing or per-seat pricing
Privacy: Fully local—no user data sent to external servers
Customization: Add roles, tools, memory, vector search, etc.
Model flexibility: Use DeepSeek, Mistral, LLaMA, GPT, or Claude

2. 🛠️ Prerequisites and Tools

To follow along, you’ll need:

Python 3.9+
FastAPI
Uvicorn
Ollama / llama.cpp / LMDeploy
(Optional) Docker
(Optional) NGINX or Caddy for HTTPS

bash
pip install fastapi uvicorn requests

Install Ollama:

bash
curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-chat

3. 📁 Directory Structure

arduino
chatbot-api/
├── chatbot_api.py
├── models/
│   └── deepseek.py
├── utils/
│   └── prompt_formatter.py
├── sessions/
│   └── memory.py
├── templates/
│   └── base.html
├── config.py
└── requirements.txt

4. 🧠 `chatbot_api.py`: Full Code Walkthrough

Here’s a simplified version of the core FastAPI app:

python=
from fastapi import FastAPI, Requestfrom pydantic import BaseModelimport requests

app = FastAPI()class ChatRequest(BaseModel):
    user_id: str
    message: str@app.post("/chat")def chat_endpoint(req: ChatRequest):
    response = requests.post("http://localhost:11434/api/generate", json={        
    "model": "deepseek-chat",        "prompt": req.message,        "stream": False
    })
    reply = response.json()['response']    return {"reply": reply}

5. 🧾 Chat History and Memory

To simulate context retention, add a memory class:

python
user_sessions = {}def update_memory(user_id, message, response):    
if user_id not in user_sessions:
        user_sessions[user_id] = []
    user_sessions[user_id].append({"user": message, "bot": response})def get_history(user_id):
    history = user_sessions.get(user_id, [])    
    return "\n".join([f"User: {h['user']}\nBot: {h['bot']}" for h in history])

Then prepend this history to each prompt before sending it to DeepSeek.

6. 🔁 Streaming Support (Optional)

For real-time updates:

python
@app.post("/stream")def chat_stream(req: ChatRequest):    
with requests.post("http://localhost:11434/api/generate", 
json={        "model": "deepseek-chat",        
"prompt": req.message,        "stream": True
    }, stream=True) as r:        for chunk in r.iter_lines():            
    yield chunk

Use StreamingResponse from FastAPI for cleaner output.

7. 🔄 Switching Between Models

To toggle models dynamically:

python
@app.post("/chat")def chat(req: ChatRequest):
    model = "deepseek-chat" if req.user_id.startswith("dev") else "mistral"
    payload = {        "model": model,        "prompt": req.message,        "stream": False
    }
    response = requests.post("http://localhost:11434/api/generate", json=payload)    
    return {"reply": response.json()['response']}

8. 🖥️ Frontend Integration Tips

You can easily build a React/Vue chatbot UI or even embed this API in:

Telegram Bots
WhatsApp Webhooks
Slack apps
Custom dashboards

Return CORS-friendly responses:

python
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # In production, restrict this
    allow_methods=["*"],
    allow_headers=["*"],
)

9. 🔐 Authentication and Rate Limiting

Add basic token auth:

pythonfrom fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")@app.post("/chat")def 
chat(req: ChatRequest, api_key: str = Depends(api_key_header)):    
if api_key != "my-secret-key":        
raise HTTPException(status_code=403, detail="Forbidden")
    ...

Add rate limiting with slowapi or Redis-based counters.

10. 🚀 Deployment Options

Option A: Local Dev Server

bash
uvicorn chatbot_api:app --host 0.0.0.0 --port 8000

Option B: Dockerized App

dockerfile
FROM python:3.10
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "chatbot_api:app", "--host", "0.0.0.0", "--port", "8000"]

Run with:

bash
docker build -t chatbot-api .
docker run -p 8000:8000 chatbot-api

11. 🧪 Testing and Debugging

Use tools like:

Postman or Insomnia to test POST requests
pytest for unit tests on memory, formatting
LangSmith to evaluate responses
Docker logs or Sentry for error tracking

12. 🧳 Conclusion + Template Download

chatbot_api.py is the core of a customizable, scalable, and cost-efficient AI chatbot service. By combining the power of FastAPI and DeepSeek (or any open-source LLM), you can deploy your own secure chatbot stack in just a few hours.