← WRITING
RAG·15 Jun 2026·10 MIN READ

Self Adaptive Context RAG

A RAG system that adapts to the query and corrects its own reasoning — adaptive web scraping, FAISS/ChromaDB retrieval, and a two-stage self-correcting LLM that critiques and improves its first answer.

Retrieval-Augmented Generation (RAG) pairs the recall of search systems with the language understanding of LLMs. This is an advanced take I call Self Adaptive Context RAG — a system that retrieves relevant information, adapts its context understanding to the query, and corrects its own reasoning.

Source code (Colab): open the notebook.

The problem with traditional RAG

  • Static retrieval that doesn't adapt to query complexity
  • Limited context understanding on real-time information
  • No self-correction — errors propagate
  • Poor integration between web scraping and vector databases

This system tackles all four with an architecture that adapts to query type and self-corrects.

Architecture: five components

1. Intelligent web scraper

  • Dual methods — HTTP requests and Selenium WebDriver
  • MD5-based caching with configurable expiry
  • Priority URL selection — reputable news sources first
  • Adaptive retry: falls back when one method fails
class WebScraper:
    def __init__(self):
        self.session = requests.Session()
        self.driver = None
        os.makedirs(config.cache_dir, exist_ok=True)

2. Advanced text chunker

  • Token-aware chunking for accurate length
  • Overlap strategy for context continuity
  • Metadata preservation (source, timestamps)

3. Embedding manager

Powered by SentenceTransformer — batch processing for throughput, with robust handling of empty or malformed text.

4. Vector database

  • FAISS and ChromaDB support
  • Hash-based duplicate detection
  • Similarity-threshold filtering — only genuinely relevant hits
  • Persistent storage across sessions

5. Self-correcting LLM interface

The centerpiece: a two-stage reasoning process — an initial answer, then a critical self-revision.

The self-correction innovation

Stage 1 — initial reasoning:

def _create_reasoning_prompt(self, query, context):
    return f"""
    You are an expert AI assistant with access to relevant information.
    Context Information:
    {context}
    User Question: {query}
    Instructions:
    1. Analyze the provided context carefully
    2. Identify key information relevant to the question
    3. Reason through the problem step by step
    4. Provide a well-structured, comprehensive answer
    """

Stage 2 — self-correction: the system critiques its own answer for factual accuracy, completeness, logical consistency and proper context use, then emits an improved final answer that fixes errors, fills gaps and sharpens structure. This dual pass measurably lifts answer quality and cuts hallucinations.

Real-world performance

Query: "Did a plane crash in Ahmedabad?"

  • Processing time: 123.24s · Pages scraped: 9
  • Chunks generated: 54 · Relevant chunks retrieved: 5
  • Similarity scores: 0.662, 0.662, 0.662

It correctly surfaced a real-time incident — an aircraft losing altitude shortly after departure and crashing into Meghani Nagar — pulled live from current sources.

Query: "What's going on with Trump and Musk?" — 180.69s, 5 pages, 22 chunks, 5 retrieved (scores ~0.65), synthesizing a multi-source analysis of recent developments.

Implementation highlights

Centralized configuration:

@dataclass
class Config:
    embedding_model_name: str = "all-MiniLM-L6-v2"
    llm_model_name: str = "deepseek-r1:14b"
    max_tokens: int = 4096
    chunk_size: int = 256
    top_k_retrieval: int = 5
    similarity_threshold: float = 0.15

The pipeline runs fully asynchronously:

async def process_query(self, query, use_web_search=True):
    scraped_data = await self.scraper.search_and_scrape(query)
    embeddings = self.embedder.embed_texts(texts)
    self.vector_db.add_documents(all_chunks, embeddings)
    relevant = self.vector_db.search(query_embedding, config.top_k_retrieval)

Resilience throughout: timeout management, retry with exponential backoff, graceful degradation to cached results, and proper cleanup of Selenium drivers and connections.

Optimizations

  • MD5 content hashing avoids duplicate processing; 48h configurable cache; persistent vectors
  • Batched embeddings (64 default), asyncio-parallel scraping, streaming for memory

Lessons learned

  • Context quality over quantity — a few highly-relevant chunks beat many loose ones; the 0.15 threshold was tuned through testing.
  • Self-correction is game-changing — two-stage reasoning consistently beats single-pass, worth the extra compute.
  • Robustness matters — real web scraping is unpredictable; requests → Selenium → cache fallbacks are essential.
  • Monitoring is crucial — logging similarity scores, chunk counts and timings makes optimization tractable.

Conclusion

Self Adaptive Context RAG combines adaptive scraping, sophisticated text processing, vector retrieval and self-correcting reasoning into accurate, contextual answers — handling real-time information without sacrificing accuracy. The future of AI isn't only bigger models; it's smarter systems that adapt and correct themselves. Notebook: open in Colab.

RAGSelf-CorrectionVector SearchWeb ScrapingFAISSLLM
RELATED READING
Vector Search at Scale: FAISS HNSW for Multilingual Dedup & RetrievalRETRIEVALGrounded Extraction: Getting to 98% Zero-Failure at ScaleARCHITECTUREBuilding a 100% Local GraphRAG with Ollama, Neo4j, Qdrant & LangExtractGRAPHRAG
Building something similar? Let's talk ↗