✅ chatbot_api.py: A Complete Guide to Building Your Own AI Chatbot API (2025 Edition)

ic_writer ds66
ic_date 2024-12-26
blogs

🔍 Introduction

In the rapidly evolving field of conversational AI, creating your own custom chatbot API has become more feasible than ever—thanks to powerful open-source large language models like DeepSeek, flexible backends like FastAPI, and lightweight deployment tools like Ollama and llama.cpp.

63172_uqdd_9854.jpeg

This guide walks you through creating a Python-based chatbot API using chatbot_api.py, a clean, production-ready server built with FastAPI that integrates a local LLM (e.g., DeepSeek), supports chat history, streaming, multi-user sessions, and even plugin-style extensibility.

Whether you’re developing a customer support bot, developer assistant, or private enterprise agent, this guide will help you:

  • Understand how chatbot_api.py is structured

  • Deploy it with DeepSeek or another model

  • Extend its functionality

  • Secure it for production

  • Optimize its performance

✅ Table of Contents

  1. Why Build Your Own Chatbot API?

  2. Prerequisites and Tools

  3. Directory Structure of chatbot_api.py

  4. Full Code Walkthrough

  5. Chat History and Sessions

  6. Adding Streaming Support

  7. Switching Between Models (DeepSeek, GPT, Claude)

  8. Frontend Integration Tips

  9. Authentication and Rate Limiting

  10. Deployment Options (Docker, VPS, Serverless)

  11. Testing and Debugging

  12. Conclusion + Download the Template

1. 🤖 Why Build Your Own Chatbot API?

✅ Advantages:

  • Cost control: No token billing or per-seat pricing

  • Privacy: Fully local—no user data sent to external servers

  • Customization: Add roles, tools, memory, vector search, etc.

  • Model flexibility: Use DeepSeek, Mistral, LLaMA, GPT, or Claude

2. 🛠️ Prerequisites and Tools

To follow along, you’ll need:

  • Python 3.9+

  • FastAPI

  • Uvicorn

  • Ollama / llama.cpp / LMDeploy

  • (Optional) Docker

  • (Optional) NGINX or Caddy for HTTPS

bash
pip install fastapi uvicorn requests

Install Ollama:

bash
curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-chat

3. 📁 Directory Structure

arduino
chatbot-api/
├── chatbot_api.py
├── models/
│   └── deepseek.py
├── utils/
│   └── prompt_formatter.py
├── sessions/
│   └── memory.py
├── templates/
│   └── base.html
├── config.py
└── requirements.txt

4. 🧠 chatbot_api.py: Full Code Walkthrough

Here’s a simplified version of the core FastAPI app:

python=
from fastapi import FastAPI, Requestfrom pydantic import BaseModelimport requests

app = FastAPI()class ChatRequest(BaseModel):
    user_id: str
    message: str@app.post("/chat")def chat_endpoint(req: ChatRequest):
    response = requests.post("http://localhost:11434/api/generate", json={        
    "model": "deepseek-chat",        "prompt": req.message,        "stream": False
    })
    reply = response.json()['response']    return {"reply": reply}

5. 🧾 Chat History and Memory

To simulate context retention, add a memory class:

python
user_sessions = {}def update_memory(user_id, message, response):    
if user_id not in user_sessions:
        user_sessions[user_id] = []
    user_sessions[user_id].append({"user": message, "bot": response})def get_history(user_id):
    history = user_sessions.get(user_id, [])    
    return "\n".join([f"User: {h['user']}\nBot: {h['bot']}" for h in history])

Then prepend this history to each prompt before sending it to DeepSeek.

6. 🔁 Streaming Support (Optional)

For real-time updates:

python
@app.post("/stream")def chat_stream(req: ChatRequest):    
with requests.post("http://localhost:11434/api/generate", 
json={        "model": "deepseek-chat",        
"prompt": req.message,        "stream": True
    }, stream=True) as r:        for chunk in r.iter_lines():            
    yield chunk

Use StreamingResponse from FastAPI for cleaner output.

7. 🔄 Switching Between Models

To toggle models dynamically:

python
@app.post("/chat")def chat(req: ChatRequest):
    model = "deepseek-chat" if req.user_id.startswith("dev") else "mistral"
    payload = {        "model": model,        "prompt": req.message,        "stream": False
    }
    response = requests.post("http://localhost:11434/api/generate", json=payload)    
    return {"reply": response.json()['response']}

8. 🖥️ Frontend Integration Tips

You can easily build a React/Vue chatbot UI or even embed this API in:

  • Telegram Bots

  • WhatsApp Webhooks

  • Slack apps

  • Custom dashboards

Return CORS-friendly responses:

python
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # In production, restrict this
    allow_methods=["*"],
    allow_headers=["*"],
)

9. 🔐 Authentication and Rate Limiting

Add basic token auth:

pythonfrom fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")@app.post("/chat")def 
chat(req: ChatRequest, api_key: str = Depends(api_key_header)):    
if api_key != "my-secret-key":        
raise HTTPException(status_code=403, detail="Forbidden")
    ...

Add rate limiting with slowapi or Redis-based counters.

10. 🚀 Deployment Options

Option A: Local Dev Server

bash
uvicorn chatbot_api:app --host 0.0.0.0 --port 8000

Option B: Dockerized App

dockerfile
FROM python:3.10
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "chatbot_api:app", "--host", "0.0.0.0", "--port", "8000"]

Run with:

bash
docker build -t chatbot-api .
docker run -p 8000:8000 chatbot-api

11. 🧪 Testing and Debugging

Use tools like:

  • Postman or Insomnia to test POST requests

  • pytest for unit tests on memory, formatting

  • LangSmith to evaluate responses

  • Docker logs or Sentry for error tracking

12. 🧳 Conclusion + Template Download

chatbot_api.py is the core of a customizable, scalable, and cost-efficient AI chatbot service. By combining the power of FastAPI and DeepSeek (or any open-source LLM), you can deploy your own secure chatbot stack in just a few hours.