LogisticsHybrid#LLM/RAG#Data Engineering#Cloud

A RAG assistant that turned scattered SOPs into instant answers for dispatchers

A logistics & freight company

[A logistics & freight company] · LLM/RAG · Data Engineering · Cloud · Hybrid

A RAG assistant that turned scattered SOPs into instant answers for dispatchers

Context

A logistics and freight company ran its operations on a large body of internal knowledge — standard operating procedures, routing rules, carrier requirements, exception handling — spread across documents, wikis, and shared drives that had accumulated over years. Dispatchers needed answers fast, in the middle of live operations, but finding the right procedure meant knowing where it lived. New dispatchers took a long time to get productive, and even experienced ones escalated questions that the documentation could have answered if anyone could find it.

Challenge

The knowledge existed; it just wasn't reachable in the moment it was needed. Documents were fragmented and inconsistent, some superseded but never removed, so even a keyword search returned a pile of maybe-relevant results across different systems. For a dispatcher handling a live exception, that's useless — they need the answer, grounded in current procedure, in seconds. And some of the operational documents were sensitive enough that the client wasn't comfortable indexing them in a public or shared service; those had to stay in a private store. The system also had to be honest: a confidently wrong answer about a routing rule or a carrier requirement could send a shipment the wrong way.

Approach

We scoped this as a production-RAG problem, which meant the work was mostly in the unglamorous parts: getting fragmented documents into a clean, current, retrievable form, and making the assistant trustworthy enough that dispatchers would actually rely on it.

On the data side, we consolidated the scattered sources and addressed the duplication and stale-document problem, because retrieving over a messy corpus is how RAG systems produce confidently outdated answers. We chunked documents along their natural structure — individual SOP steps, distinct rules — rather than arbitrary windows, and preserved metadata so answers could cite their source and the system could prefer current procedures.

On the trust side, we built grounding and a refusal path on purpose. Answers are generated only from retrieved documents and cite them, so a dispatcher can verify; and when retrieval confidence is low, the assistant says it doesn't have a covering document instead of guessing. We assembled an evaluation set of real dispatcher questions tied to the documents that answer them, so retrieval quality was measured and tuned rather than eyeballed.

On deployment, we matched the architecture to the sensitivity of the data: a hybrid approach that kept sensitive operational documents in a private store while using cloud services for the rest.

Architecture

A hybrid RAG system designed around both retrieval quality and the sensitivity of the documents.

Data consolidation: fragmented SOPs and operational docs were consolidated and de-duplicated, with stale material handled so retrieval favors current procedures.
Structure-aware retrieval: documents chunked along their natural units (SOP steps, rules) with source and recency metadata preserved for citation and filtering.
Grounded generation with refusal: answers generated only from retrieved context, cited inline, with a retrieval-confidence threshold that triggers an honest "I don't have a document covering that" instead of a fabricated answer.
Hybrid deployment for data sensitivity: sensitive operational documents are indexed in a private store, while less-sensitive content and supporting services run in the cloud — the split was driven by the client's data-sensitivity requirements, not convenience.
Retrieval evaluation: a golden set of real dispatcher questions used to measure and tune retrieval quality before and after changes.

This is practical LLM value: not a flashy demo, but a system dispatchers trust because it cites its sources and admits when it doesn't know — built with a deployment split that respects which documents can live where.

Results

Onboarding time for new dispatchers down — answers are reachable without knowing where every document lives.
Fewer escalations, as the assistant deflects questions the documentation can answer.
Measurable query deflection on operational questions, freeing experienced staff from repetitive lookups.
A trustworthy assistant: grounded, cited answers with an explicit refusal path, so wrong-but-confident answers are designed out.

Stack

LLM with RAG · structure-aware chunking and retrieval · private document store (sensitive docs) · cloud services (non-sensitive content and serving) · retrieval-evaluation harness · consolidated data pipeline · hybrid (private + cloud) infrastructure.

Production RAG lives or dies on retrieval quality and an honest refusal path — the unglamorous 80%. See how we build LLM systems →, or read RAG that survives production for the playbook behind a build like this.

Have a similar problem?

Talk to us