Flagship Case Study

AI News Intel: From messy source material to evidence-grounded intelligence.

The live intelligence interface uses newspaper editions as the source. The same architecture — OCR, segmentation, hybrid retrieval, evidence-quality gate, source-aware answer — fits any messy source-of-truth a business actually depends on.

  • Live system
  • Source-aware
  • Confidence labels
  • Privacy-conscious

Problem

Why messy source material defeats most assistants.

Click any row to read why.

System at a glance

From messy source material to evidence-grounded intelligence.

Same architecture across source types. Newspapers are the live demo above; the same path fits scanned reports, transactions, transcripts, and operational records.

  1. Source intake

    Images, PDFs, records

  2. OCR / parsing

    Layout-aware extraction

  3. Segmentation

    Per-document addressing

  4. Hybrid retrieval

    Lexical + semantic

  5. Evidence quality gate

    Strong vs weak signal

  6. Source-aware answer / dashboard

    Briefings, answers, decisions

Newspaper editions are the live implementation. The same pattern works for any complex source material.

Pipeline

Seven steps from source to answer.

01

Edition intake

Daily newspaper images land in a watched Drive folder for ingestion on schedule.

Drive + scheduled ingest

Retrieval & Evidence

Answers grounded in what the source actually says.

Designed to reduce unsupported answers — not to claim they're impossible.

Hybrid retrieval

Lexical and semantic signals combine so a phrasing mismatch never loses a relevant document.

Evidence quality gate

Thin or off-topic retrieval is treated as low-confidence and triggers a cautious response.

Confidence labels

Every answer carries HIGH / MEDIUM / LOW so readers know how much weight to give it.

Source drawer

Cited source and section live one tap away from the answer for human verification.

System Diagrams

Deployment, API, retrieval.

Switch tabs to see how each surface fits together.

Frontend deployment flow

  1. Local code

    Next.js

  2. Commit + push

    main branch

  3. Build pipeline

    Typecheck + build

  4. Static export

    out/

  5. Private hosting

    public site

  6. Live website

    vachanambati.com

The release path keeps build checks and publishing repeatable without exposing deployment credentials.

Experience & Reliability

Product decisions that protect the reading moment.

Query-first interface
In-page answer deck
Sources hidden by default
Overscroll containment
Quick briefings + free-form
Visible elegant scrollbar
OCR-ready ingestion
Private service layer
Automated release flow
Evidence-grounded answers
Weak-evidence fallback
Cross-source retrieval roadmap

System Interfaces

Four public capabilities power the experience.

Private implementation details, admin tooling, credentials, and operational routes stay out of the public surface.

  • GETLatest edition

    Most recent processed source set with availability flags.

  • GETEdition picker

    Available source dates plus the latest, ready for review.

  • POSTQuestion answering

    Evidence-grounded answer for a free-form question with confidence and sources.

  • GETSection briefing

    Pre-computed briefing by section, theme, or business context.

Failure Cases Solved

Real problems, real fixes.

Each item below was a concrete bug diagnosed in production and fixed.

Noisy OCR, weak evidence

Hybrid retrieval + quality gate triggers cautious answers instead of confident-sounding noise.

Generic short answers

Composer expanded for synthesis-grade evidence; structured Key Developments + Why It Matters.

Source clutter in main response

Citation tags + trailing source blocks stripped from the answer; clean prose, drawer for sources.

Small answer box / scroll fight

Larger deck + overscroll containment + visible scrollbar — reading no longer fights the page.

Manual release friction

Replaced by a repeatable release path with typecheck and build gates before publishing.

Large binary timed out CI

Multi-MB static document excluded from CI; uploaded once manually via the host file manager.

Business Applications

Same retrieval pattern. Different business systems.

Most businesses do not lack data. They lack a reliable intelligence layer over messy records.

Finance

Transaction & receipt intelligence

Turn months of transactions, invoices, and MSME receipts into searchable, source-linked answers and finance dashboards.

Trade

Trade-document intelligence

Bills of lading, LCs, shipping documents, and corridor research collapsed into one auditable query surface.

Operations

Internal evidence search

Internal files, policy notes, and operational records made searchable with evidence-grounded responses for analysts.

Analytics

Decision dashboards

Same retrieval pattern feeding analytics surfaces — answers carry their sources, decisions carry their basis.

Research

Transcript & policy assistants

Long-form transcripts, policy notes, and research evidence packs converted into question-answer interfaces.

MSME

Operational intelligence for small businesses

Most MSMEs do not lack data — they lack a reliable intelligence layer over scattered records. Same architecture, fitted to their stack.

Roadmap

Where this goes next.

Cross-source retrieval
OCR quality diagnostics
Public API docs
Admin ingestion dashboard
Beyond newspapers
Business + trade modes

Build a system like this for your documents, workflows, or records.

AI News Intel is one example. The same intelligence layer fits any business that needs reliable answers over messy data.