What is RAG in simple terms?

RAG stands for Retrieval-Augmented Generation. Instead of asking an AI to generate text purely from its training data — which can produce plausible-sounding but inaccurate claims — RAG first retrieves relevant source documents and then generates text grounded in those specific sources. For immigration petitions, this means the AI generates petition content based on the client's actual documents, not from generic knowledge about what petitions usually say.

Does RAG eliminate hallucinations completely?

No — RAG significantly reduces hallucinations but does not eliminate them entirely. The AI is still generating text, and it can misquote, misinterpret, or over-generalize from the retrieved documents. This is why Immigration Copilot includes a validation step: every factual claim in the generated petition is checked against the source documents. RAG reduces the hallucination problem; validation catches what remains.

Is pgvector a standard choice for legal RAG applications?

pgvector is a PostgreSQL extension for storing and querying vector embeddings. It is increasingly used in legal and enterprise RAG applications because it integrates with existing PostgreSQL databases (no separate vector database infrastructure needed), supports similarity search over large document collections, and can be queried with regular SQL alongside structured data. For a system that also stores case data, document metadata, and petition content in the same database, pgvector is a natural choice.

What is the difference between RAG and fine-tuning for legal document generation?

Fine-tuning modifies a model's weights using training examples — it teaches the model to produce output in a specific style or format. RAG provides the model with specific source documents at generation time — it grounds the model's output in specific facts. For immigration petitions, both techniques are useful but solve different problems: fine-tuning can produce better-formatted petition language and more accurate legal structure; RAG ensures the specific factual claims are grounded in the client's actual documents. Immigration Copilot uses both.

Why does the petition need exhibit citations if RAG is grounding it in the actual documents?

RAG ensures that the generated text is based on real documents from the client's file — not invented facts. But USCIS adjudicators reviewing the petition cannot see the AI's retrieval context; they see only the petition letter and the exhibits. The petition letter must cite specific exhibits so the adjudicator can verify each claim. RAG grounds the generation in the correct documents; exhibit citations make that grounding visible and verifiable to the adjudicator.

How does the system ensure that RAG retrieves the right documents for each petition section?

Retrieval is guided by semantic similarity search: the system creates a query representing the petition section being drafted (e.g., 'Criterion 4 judging activities evidence'), and retrieves documents whose vector embeddings are most similar to that query. The KB structure further guides retrieval by pre-organizing documents into criterion categories, ensuring that Criterion 4 evidence is specifically available when drafting the Criterion 4 section. Both semantic search and structured KB retrieval work together.

Can the attorney see which documents the AI retrieved when generating each section?

Yes. Immigration Copilot shows the source documents and KB entries used to generate each section, alongside the generated text. This allows the attorney to verify that the generation was grounded in the correct evidence and to identify any sections where the retrieval pulled less relevant documents. Transparent retrieval attribution is essential for attorney trust in the system.

What AI models does Immigration Copilot use for petition generation?

Immigration Copilot uses Claude Opus (via Amazon Bedrock) for petition section generation — the most capable model in the Claude family, optimized for high-quality long-form writing with complex reasoning. For document classification and lighter tasks, it uses Claude Haiku for speed and cost efficiency. The petition generation uses Claude Opus because petition quality directly affects the outcome of a USCIS adjudication — this is not a task where model quality tradeoffs are acceptable.

Document Intelligence

How RAG Powers EB1A Petition Drafting Without Hallucinations

How RAG prevents hallucinations in EB1A petition drafting: the technical architecture, pgvector retrieval, Claude Opus generation, and attorney review requirements.

May 1, 2026·Updated May 7, 2026·11 min read

EB1A petition drafting is a factually dense writing task. A 40-page petition letter may contain 200+ specific factual claims: award names and organizations, publication titles and journals, citation counts, compensation figures, role descriptions, and assessments of significance. Every claim must be grounded in the client's actual documents and supported by a specific exhibit. A claim that cannot be traced to an exhibit is a liability — either removed (weakening the petition) or unsupported (potentially false).

This is why Retrieval-Augmented Generation matters for immigration petition drafting more than for almost any other writing task: the hallucination problem is severe, the cost of errors is high, and the solution is structurally clean — ground the generation in the actual documents.

Grounded

What RAG adds to standard AI generation — source-based output

Standard AI generation can invent plausible-sounding facts from training data. RAG constrains the generation to the specific documents retrieved from the client's file, making hallucination structurally much harder

Traceable

The property that makes RAG output attorney-reviewable

Because generation is grounded in specific retrieved documents, every generated claim can be traced to its source — attorneys can verify generation quality by checking which documents the system retrieved for each section

Validated

The step after RAG generation that catches remaining errors

RAG reduces hallucinations but doesn't eliminate them — a post-generation validation step checks every factual claim against source documents, catching misquotations and over-generalizations

Why Standard AI Generation Fails for Immigration Petitions

Large language models generate text by predicting likely continuations of their inputs based on patterns in their training data. Given the prompt "Write the Criterion 5 section for a machine learning researcher's EB1A petition," a standard model produces text that sounds like an EB1A petition — because it has seen many EB1A petitions in training data — but the specific facts it generates are patterns, not real.

The hallucination that ends petitions. A model asked to write about a researcher's contributions will generate specific-sounding claims: a citation count that is plausible but wrong by 200, an award that the alien received from a prestigious institution — except the institution name doesn't match any actual document. These errors look reasonable on first read. They are fatal when an adjudicator requests the supporting document and finds it doesn't match.

The pattern that replaces the client. Standard generation with only a general prompt produces generic EB1A petition language that could apply to any researcher in the field. It fails to incorporate the client's specific evidence — the particular award that is unusually prestigious, the specific contribution whose citation count is especially strong, the expert letter from a National Academy member that provides unusually credible testimony. Generic output fails to present the strongest possible case.

Under 8 CFR 204.5(h), every assertion in the petition letter must be supported by documentary evidence. If the AI generates a claim that cannot be traced to an exhibit, the attorney either weakens the petition by removing it or misrepresents a fact by leaving it unsupported. Neither outcome is acceptable. RAG prevents the problem at the source.

How RAG Works: The Technical Architecture

RAG has three phases: ingestion, retrieval, and generation.

Ingestion. Each client document is processed into two representations:

Structured facts (the KB). Key facts are extracted by AI and stored in the structured client knowledge base — discrete, queryable facts with exhibit references. "Won IEEE Best Paper Award at ICLR 2023 (Exhibit 1)" is one KB fact. This structured layer is the primary retrieval source for known facts.

Vector embeddings. Each document is also converted to a vector — a mathematical representation of the document's semantic content — by an embedding model. These vectors are stored in a vector database (Immigration Copilot uses pgvector in PostgreSQL). Documents with similar semantic meaning have similar vectors.

Retrieval. When generating a petition section, the system creates a query for that section and searches for relevant documents in two ways:

Structured KB lookup. For the Criterion 4 (judging) section, the system retrieves all KB entries tagged as Criterion 4 evidence — directly, without semantic approximation.

Vector similarity search. For broader context, the system queries the vector database for documents whose embeddings are most similar to the section query. This retrieves documents the structured KB might not have explicitly tagged as directly relevant but that contain useful contextual information.

Generation. The retrieved KB entries and documents are passed to the generation model (Claude Opus) alongside: the section prompt, the regulatory standard for the relevant criterion, and the required citation format. The model generates petition-quality prose grounded in the retrieved evidence.

RAG architecture components and their roles
Criterion	Regulatory Name	2024–2025 Pattern	Risk Level
C1	pgvector (vector storage and similarity search)	PostgreSQL extension storing document embeddings alongside structured case data. Enables semantic similarity search over the document collection: 'find documents semantically similar to this query.' All vector operations run within the existing PostgreSQL database — no separate vector database infrastructure required. Queries combine structured SQL filters (by case ID, document type, criterion) with vector similarity search.	Strong
C2	Amazon Titan Embeddings (vector generation)	Embedding model via AWS Bedrock that converts document text to vector representations. Runs in the same AWS infrastructure as the generation models, avoiding cross-cloud data transfer. Each document is embedded once at ingestion time; the stored vector is reused for all subsequent similarity searches.	Strong
C3	Structured KB (fact storage and retrieval)	The client knowledge base stores discrete facts with criterion annotations and exhibit references. Primary retrieval source for known facts that should appear in specific petition sections. Complements vector search: KB retrieval is precise (fetch all Criterion 4 facts); vector search is broad (fetch semantically relevant context).	Strong
C4	Claude Opus via Amazon Bedrock (generation)	The generation model that produces petition-quality prose from the retrieved context. Claude Opus is used for generation specifically because petition quality directly affects USCIS outcomes — this is the highest-stakes writing task in the pipeline, and model quality matters. The model is instructed to ground every claim in the retrieved context and cite exhibits by number.	Strong
C5	Post-generation validation	After generation, each factual claim in the generated text is verified against the source KB entries and documents. Claims that cannot be traced to source evidence are flagged for attorney review. This is the safety net that catches hallucinations RAG didn't prevent: misquotations, overstatements, or facts the model inferred rather than retrieved.	Strong

Documents feeding into a large fountain pen representing the RAG pipeline from client evidence through retrieval to petition generation

RAG for Different Petition Sections

Different petition sections require different retrieval strategies. Understanding how retrieval works for each section helps attorneys understand what the generation is grounded in:

Criterion sections (C1–C10). Primary retrieval: all KB entries tagged for that criterion, plus supporting context documents. The structured KB ensures that every documented piece of evidence for each criterion is available to the generator. No qualifying evidence is overlooked because retrieval happened to return something else.

Career summary and biographical section. Primary retrieval: biographical KB entries, employment history facts, and career timeline. The generator produces a chronological professional narrative grounded in the actual career record, not a generic research career pattern.

Kazarian Step 2 / Final Merits section. Primary retrieval: high-level KB summary, most impressive evidence from each criterion, comparison data (salary percentile, award prestige notes). The generator synthesizes across the full evidence record to produce the totality argument.

RFE response sections (when applicable). Primary retrieval: the specific RFE ground, the KB entries related to the challenged criterion, and additional evidence not included in the original filing. The generator produces targeted responses to specific USCIS concerns grounded in evidence not previously presented.

RAG Limitations Attorneys Must Understand

RAG significantly reduces hallucinations but is not infallible. Attorneys should understand what can still go wrong:

Source document quality. RAG is only as good as the documents retrieved. If a client's documents contain errors — a publication title misspelled on the award certificate, an incorrect date in an employment letter — the AI may faithfully reproduce those errors. Review source documents for accuracy before generation.

Retrieval misses. Vector similarity search doesn't find every relevant document — it finds documents semantically similar to the query. A highly technical document about a contribution may not semantically match a broad query. The structured KB mitigates this, but edge cases exist.

Misinterpretation. The AI may retrieve the correct document but interpret it incorrectly. A document describing a nomination for an award might be interpreted as receipt of the award. A grant proposal might be interpreted as a grant award. Validation catches these, but attorney review of generated sections remains essential.

No substitution for legal judgment. RAG produces petition text grounded in evidence. It does not evaluate whether that evidence meets the USCIS legal standard for any criterion. Whether a specific award is "nationally recognized," whether a salary is "high relative to others in the field," and whether a contribution is of "major significance" are legal questions that require attorney analysis.

RAG grounds the facts — attorneys provide the legal analysis. These are different tasks and both are required.

An attorney using Immigration Copilot does not review a finished petition — they review a well-grounded first draft. The generated text presents specific facts from real documents with specific exhibit citations. The attorney's review adds what no AI can provide: the legal analysis of whether the documented facts satisfy USCIS standards, the strategic judgment about which evidence to emphasize, and the professional judgment about what the petition is and is not claiming. RAG improves the starting point; attorney analysis determines the quality of the finish.

Why This Architecture for Immigration

The RAG + structured KB architecture is particularly well-suited to immigration petition drafting for three specific reasons:

Every claim needs a citation. 8 CFR 204.5(h) requires documentary support for every assertion. RAG's traceable generation — where every generated claim can be attributed to a specific source document — maps directly onto this requirement. The exhibit citation in the generated text is not an afterthought; it is the output of a retrieval process that fetched the source document.

The factual record is large but bounded. A client's 150-document file is large enough that raw document passing is impractical, but small enough that a well-indexed vector search retrieves the right documents reliably. The problem is well-scoped for this architecture.

The legal standards are fixed. EB1A has 10 criteria with specific regulatory text and USCIS policy guidance. These standards are stable and well-documented. The generation model can be instructed precisely on the legal standard to apply to the retrieved evidence, producing output that addresses USCIS standards specifically.

RAG is the difference between an AI that writes about immigration and an AI that writes about this client's immigration case

Generic AI can produce text that sounds like an EB1A petition. RAG-powered AI produces text that argues this specific client's case using this specific client's evidence with citations to this client's exhibits. The difference is not cosmetic — it is the difference between a petition that presents a strong, specific, well-documented argument and one that presents generic legal assertions that sound like they might apply to anyone. The attorney's job is still to evaluate, refine, and add legal judgment. But the starting point RAG provides is qualitatively better than any alternative.

For the document classification layer that feeds the RAG pipeline, see how AI classifies EB1A supporting documents. For the knowledge base construction that organizes classified evidence for retrieval, see how AI builds an EB1A client knowledge base. For the full end-to-end preparation workflow and how all these components fit together, see EB1A drafting efficiency: from 200 hours to 40 and the EB1A petition guide.

A classical balance scale with document stacks representing the post-generation validation step that verifies every generated claim against source documents

Immigration Copilot uses RAG + structured KB + post-generation validation to draft EB1A petitions grounded in your client's actual evidence. Get started →

EB1A Practice Tips

Get bimonthly guides for immigration attorneys

Criterion deep-dives, workflow tips, and USCIS updates. No spam. Unsubscribe any time.