1. How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek‑R1 and Its Peers ⚖️

ic_writer ds66
ic_date 2024-07-18
blogs

1.1 Motivation

Constitutional AI (CAI), introduced by Anthropic, encourages models to self-critique and refine their own answers without human supervision. While CAI has shown promise in large models, its efficacy in small (7–9B) LLMs remained unclear. This study evaluates CAI’s impact on four models: DeepSeek‑R1-8B, Gemma‑2-9B, LLaMA‑3.1-8B, and Qwen2.5-7B, using frameworks like HarmBench to measure harmful or unsafe outputs .

27364_2jjd_3751.jpeg

1.2 Methodology

The experiment implements a three-stage CAI protocol:

  1. Initial response to harmful prompt,

  2. Self-critique using a predefined rule set,

  3. Revised answer based on critique.
    They also applied an "abliteration" technique to neutralize pre-existing safety mechanisms, isolating CAI’s contribution .

1.3 Findings

  • Harm reduction varied across architectures:

    • LLaMA-based models (including DeepSeek‑R1 distilled onto LLaMA) showed substantial safety gains. For DeepSeek‑R1, harmful outputs dropped from ~54% to ~11% post-critique .

    • In contrast, Gemma‑2 and Qwen2.5 showed negligible improvement—or even adverse effects where critiques reinforced harmful content .

Model architecture strongly influenced CAI effectiveness: DeepSeek‑R1’s reasoning chains enabled more sensitive self-critique, whereas other architectures lacked the same reflexive capacity.

1.4 Implications & Future Work

  • The study demonstrates that CAI-based self-critique can effectively align small reasoning-focused models, especially those with explicit reasoning abilities like DeepSeek‑R1.

  • Results underline the importance of architecture-specific prompting and chain-of-thought structure in safety alignment .

  • Future research should explore dynamic, model-specific constitutional rule-tuning and extend CAI into real-time applications and hybrid reward systems.

2. Knowledge Graph–Driven Retrieval Augmented Generation: Integrating DeepSeek‑R1 with Weaviate

2.1 Motivation

LLMs, including DeepSeek‑R1, often hallucinate—particularly in high-stakes domains like biomedicine. This study builds a robust response pipeline by combining:

  1. A structured knowledge graph (KG) of biomedical facts (entities and causal relationships) on age-related macular degeneration (AMD),

  2. A vector store (Weaviate) for semantic retrieval,

  3. A locally hosted DeepSeek‑R1-7B for answer generation .

2.2 System Architecture

  • Graph Construction: Researchers extracted named entities and causal relationships from AMD abstracts.

  • Vector Embedding: Subgraphs are vectorized and stored in Weaviate.

  • Query Workflow: User input → retrieve relevant subgraphs → LLM generates responses grounded in retrieved embeddings.

2.3 Results

  • Hallucination reduction: Significant decline in invented statements.

  • Factual accuracy: Improved clarity and precision with direct clinical evidence support.

  • Enhanced clarity: Structured retrieval prompts led to removal of uncertainty and ambiguity in responses .

2.4 Implications & Future Directions

  • This KG + RAG system represents a scalable framework for factual and domain-specific LLM outputs.

  • The authors propose future work to enrich the KG with more relationships and extend the approach to other medical domains while refining reasoning steps .

🌟 Synthesis & Broader Insights

ThemeCAI StudyKG‑RAG Study
FocusSafety alignment via self-critiqueKnowledge-grounded generation
LLM UsedDeepSeek-R1-8BDeepSeek-R1-7B (local)
OutcomeDramatic harm reduction for LLaMA-based modelsHallucination significantly reduced; factual clarity gained
Key takeawayArchitecture matters—reasoning supports alignmentStructured retrieval is essential for domain accuracy

Both works underscore the versatility of DeepSeek‑R1's reasoning chains: in safety-alignment and factual grounding contexts, its capacity to "think" explicitly enhances both trust and utility in domain-specific applications.

📌 Additional Notes & Enrichments

  • The CAI study uses HarmBench and shows a drop from 54% to 11% harmful responses in DeepSeek‑R1 post-constitutional prompting .

  • The KG-RAG work centers on AMD biomedical content, showing clear improvements via retrieval over silent hallucination-prone generation .

🔭 Future Research Avenues

  1. Hybrid Safety+RAG pipelines: Apply CAI within retrieval contexts to ensure both factual grounding and safe behavior.

  2. Multimodal and multilingual expansion: Extend KG-RAG to image modalities (e.g., X-ray) and other languages.

  3. Adaptive Constitutional Rule-tuning: Tailor self-critique rules based on model architecture and deployment domain.

  4. Transparency frameworks: Integration of model traceability and factual-checking mechanisms, especially in clinical or sensitive environments.