Self Adaptive Context RAG

Retrieval-Augmented Generation (RAG) pairs the recall of search systems with the language understanding of LLMs. This is an advanced take I call Self Adaptive Context RAG — a system that retrieves relevant information, adapts its context understanding to the query, and corrects its own reasoning.

Source code (Colab): open the notebook.

The problem with traditional RAG

Static retrieval that doesn't adapt to query complexity
Limited context understanding on real-time information
No self-correction — errors propagate
Poor integration between web scraping and vector databases

This system tackles all four with an architecture that adapts to query type and self-corrects.

Architecture: five components

1. Intelligent web scraper

Dual methods — HTTP requests and Selenium WebDriver
MD5-based caching with configurable expiry
Priority URL selection — reputable news sources first
Adaptive retry: falls back when one method fails

class WebScraper:
    def __init__(self):
        self.session = requests.Session()
        self.driver = None
        os.makedirs(config.cache_dir, exist_ok=True)

2. Advanced text chunker

Token-aware chunking for accurate length
Overlap strategy for context continuity
Metadata preservation (source, timestamps)

3. Embedding manager

4. Vector database

FAISS and ChromaDB support
Hash-based duplicate detection
Similarity-threshold filtering — only genuinely relevant hits
Persistent storage across sessions

5. Self-correcting LLM interface

The centerpiece: a two-stage reasoning process — an initial answer, then a critical self-revision.

The self-correction innovation

Stage 1 — initial reasoning:

def _create_reasoning_prompt(self, query, context):
    return f"""
    You are an expert AI assistant with access to relevant information.
    Context Information:
    {context}
    User Question: {query}
    Instructions:
    1. Analyze the provided context carefully
    2. Identify key information relevant to the question
    3. Reason through the problem step by step
    4. Provide a well-structured, comprehensive answer
    """

Stage 2 — self-correction: the system critiques its own answer for factual accuracy, completeness, logical consistency and proper context use, then emits an improved final answer that fixes errors, fills gaps and sharpens structure. This dual pass measurably lifts answer quality and cuts hallucinations.

Real-world performance

Query: "Did a plane crash in Ahmedabad?"

Processing time: 123.24s · Pages scraped: 9
Chunks generated: 54 · Relevant chunks retrieved: 5
Similarity scores: 0.662, 0.662, 0.662

It correctly surfaced a real-time incident — an aircraft losing altitude shortly after departure and crashing into Meghani Nagar — pulled live from current sources.

Query: "What's going on with Trump and Musk?" — 180.69s, 5 pages, 22 chunks, 5 retrieved (scores ~0.65), synthesizing a multi-source analysis of recent developments.

Implementation highlights

Centralized configuration:

@dataclass
class Config:
    embedding_model_name: str = "all-MiniLM-L6-v2"
    llm_model_name: str = "deepseek-r1:14b"
    max_tokens: int = 4096
    chunk_size: int = 256
    top_k_retrieval: int = 5
    similarity_threshold: float = 0.15

The pipeline runs fully asynchronously:

async def process_query(self, query, use_web_search=True):
    scraped_data = await self.scraper.search_and_scrape(query)
    embeddings = self.embedder.embed_texts(texts)
    self.vector_db.add_documents(all_chunks, embeddings)
    relevant = self.vector_db.search(query_embedding, config.top_k_retrieval)

Resilience throughout: timeout management, retry with exponential backoff, graceful degradation to cached results, and proper cleanup of Selenium drivers and connections.

Optimizations

MD5 content hashing avoids duplicate processing; 48h configurable cache; persistent vectors
Batched embeddings (64 default), asyncio-parallel scraping, streaming for memory

Lessons learned

Context quality over quantity — a few highly-relevant chunks beat many loose ones; the 0.15 threshold was tuned through testing.
Self-correction is game-changing — two-stage reasoning consistently beats single-pass, worth the extra compute.
Robustness matters — real web scraping is unpredictable; requests → Selenium → cache fallbacks are essential.
Monitoring is crucial — logging similarity scores, chunk counts and timings makes optimization tractable.

Conclusion

Self Adaptive Context RAG combines adaptive scraping, sophisticated text processing, vector retrieval and self-correcting reasoning into accurate, contextual answers — handling real-time information without sacrificing accuracy. The future of AI isn't only bigger models; it's smarter systems that adapt and correct themselves. Notebook: open in Colab.