Five Milestones That Shaped Legal AI
Dec 19, 2025
Written by
Professor Kevin D. Ashley opened his keynote at the Pittsburgh Legal AI Summit 2025 by framing a fundamental transition in legal AI. In the early days of computational law, researchers painstakingly encoded legal knowledge through manual rules and logical structures. Today, large language models capture legal reasoning by learning patterns from massive text corpora.
His keynote traced an arc from the top-down expert systems of the 1980s, through the bottom-up machine learning revolution of the 2010s, to today's hybrid systems that combine both approaches. For legal teams, understanding this history clarifies the difference between making smart decisions about AI tools and repeating expensive mistakes.
From Expert Systems to Large Language Models
Expert Systems: The Promise and the Problem
In the 1980s, the first wave of legal AI came as expert systems: computer programs that captured lawyers' expertise as logical rules. Don Waterman built one for RAND Corporation that advised on settling asbestos lawsuits. Today's descendants guide legal clinic interviews and help companies with regulatory compliance.
These systems had a fatal flaw. The manual process of extracting rules from legal experts was extraordinarily expensive and time-consuming. Every domain required painstaking knowledge engineering. This is why early legal tech, despite genuine capabilities, never scaled. For legal departments evaluating AI tools today, this history explains why "just automate the rules" has always been harder than it sounds.
The Rules Problem: Why Legal Language Resists Formalization
The challenge ran deeper than acquisition costs. Ashley illustrated this with a Louisiana criminal statute: "No person shall engage in or institute a local telephone call... of an anonymous nature and therein use obscene, profane, vulgar, lewd, lascivious or indecent language, suggestions or proposals of an obscene nature and threats of any kind whatsoever."
Does this prohibit calls with obscene language OR threats, or does it require BOTH? The difference determines what conduct is criminal. Yet the statute doesn't specify. Professor Layman Allen developed "normalization" techniques in the 1970s to extract all possible logical interpretations, revealing an uncomfortable truth: legislatures often don't realize their own statutes are logically ambiguous.
The problem extends beyond logic. Legal rules use "open-textured" concepts with fuzzy boundaries. Ashley used the classic example: "Vehicles are not permitted in this park." Are baby carriages prohibited? Ten-speed bikes? Motorcycles?
Scholar Lon Fuller argued such questions should be resolved by considering the rule's purpose. If the goal is limiting noise, a ten-speed bike shouldn't count as a vehicle. If it's safety, maybe it should. Legal reasoning can't be reduced to pure rule application. It requires interpretation and argument about purposes and consequences.
Argument Models: Mapping How Lawyers Think
Researchers then built systems that captured patterns of legal argument: arguing by analogy to precedents, distinguishing unfavorable cases, citing counterexamples, and appealing to underlying values. Ashley's VJAP program could predict trade secret case outcomes and explain its predictions with arguments structured like a lawyer's brief.
These systems represented the high-water mark of "symbolic AI." They captured genuine expertise. But they shared the same problem as expert systems: building them required manually reading cases, identifying relevant factors, and encoding argument patterns. Every legal domain needed its own model, painstakingly constructed.
Machine Learning: When Data Replaced Rules
Then came the breakthrough. In 2019, researchers showed that neural networks could predict case outcomes directly from judicial opinion text. No manual knowledge engineering required. Training on nearly 12,000 European Court of Human Rights cases, their models learned to predict violations with reasonable accuracy.
This was the bottom-up revolution. Instead of hand-coding legal knowledge, you could train models on thousands of cases and let them discover patterns. Case outcome prediction became viable at scale.
But there was a problem. These neural networks were black boxes. They could highlight predictive words like "concussion" or "bruises," but Karl Branting made the critical observation: "Useful decision support should help the user understand the connection between relevant portions of the case record and the issues and reasoning of the case." Highlighting words isn't legal explanation.
Large Language Models Change Everything
The key difference with LLMs is this: they can follow complex natural language instructions. You can give GPT-4 or Claude a detailed prompt explaining a legal reasoning task, and it will attempt to perform it. No custom programming required.
This capability is now being applied across all the previous milestones. McGill researchers showed that LLMs can extract logical rules directly from statutory text and convert them into expert system pathways. Their JusticeBot system, which helps laypeople with rental disputes and adoption questions, used to require legal experts to manually translate regulations into decision trees. Now GPT-4 can generate these automatically, at least for straightforward provisions.
LLMs can read judicial opinions and identify legally significant factors almost as well as human annotators. They can construct multi-step legal arguments following classical patterns: drawing analogies to precedents, distinguishing unfavorable cases, citing counterexamples, and rebutting opponents' arguments. Researchers are even using LLMs to induce general legal principles from sets of analogous decisions.
What LLMs Can Actually Do
Ashley tested Claude with the "vehicles in the park" problem: Are baby carriages vehicles? Claude's response showed genuine legal reasoning. It considered the rule's intent ("prevent motorized vehicles or larger conveyances that could damage park grounds"), cited legal precedent distinguishing vehicles from baby carriages, and noted practical consequences ("impractical and potentially discriminatory against families with young children").
This is purposive interpretation considering policy consequences. Claude absorbed these reasoning patterns from training on legal texts, not from explicit programming.
There's a catch. When Ashley posed the ambiguous Louisiana statute, Claude correctly identified the interpretive question but then resolved it too definitively: "Therefore, the answer is: No, Harry has NOT violated the statute." In reality, the ambiguity remains unresolved without case law or legislative history. LLMs can be overconfident about close questions.
The most striking capability is generating structured legal arguments. Ashley's students showed that GPT-4 can construct three-ply arguments in trade secret cases: making a point by citing precedent, responding by distinguishing that precedent and citing a counterexample, then rebutting by distinguishing the counterexample. This requires tracking which factors appear in which cases and understanding their legal significance.
Performance isn't perfect. Even advanced reasoning models sometimes claim there are no common factors when common factors exist, or generate arguments after determining they should abstain. But multi-agent approaches, where multiple LLM instances critique each other's outputs, substantially improve results.
Where Is the Knowledge?
This raises a fundamental question. In earlier expert systems, legal knowledge was explicitly represented. You could point to the rules and factors encoded in the system. With LLMs, the "knowledge" is embedded in billions of statistical parameters learned from massive text corpora.
As Ashley put it: "The wonder of the age is that the LLMs, with their contextual embeddings trained over enormous corpora, seem to capture information about the meanings of legal concepts. That is, information about the ways in which the legal concepts are used in legal documents, arguments, and explanations."
This has implications for legal departments. Auditing becomes harder because you can't inspect an LLM's reasoning like you could review rules. Flexibility increases because LLMs learn patterns of language use rather than rigid rules. But verification becomes essential. The system might produce plausible arguments citing non-existent cases or misstating principles.
This means building robust verification workflows: human review by qualified lawyers, citation checking, factual validation. Don't assume outputs are reliable because they sound sophisticated.
The Real Limits
Ashley was direct about current limitations. LLMs hallucinate, fabricating case citations or inventing facts. This isn't a glitch. It's fundamental to how these models work. They generate plausible text, not verified truths.
They're also overconfident, resolving ambiguous questions too definitively without acknowledging interpretive uncertainty. They struggle with complex interconnected legal frameworks and "hard cases" where multiple principles conflict and resolution requires value judgments.
Research on inducing rules from Chinese civil cases explicitly acknowledged: "Our methodology is intended mainly for simple civil cases like private lending, where the facts are clear, and the rights and obligations are well-defined." Hard cases involving ethics and contested values remain beyond algorithmic reach.
Ashley emphasized a principle constant through 50 years of AI and Law research: these tools should assist legal decision-makers, not replace them. The goal is keeping well-informed humans in the loop.
The Hybrid Future
The most sophisticated current research points toward hybrid systems combining LLM flexibility with structured legal knowledge. Ashley highlighted work from Zhejiang University integrating bottom-up neural networks (LLMs extract facts and predict claims) with top-down logic models (rule-based systems determine which claims legally apply). This hybrid approach outperforms either pure approach alone.
For legal departments, the lesson is clear: look for tools that blend approaches. The best legal AI products will use LLMs for document understanding and first drafts, incorporate structured knowledge bases of legal rules and precedents, employ symbolic reasoning for logical inference, and include robust verification mechanisms.
What This Means for Legal Teams
Based on the research Ashley surveyed, certain applications show clear promise. Intake automation works well for routine matters in well-defined domains. The JusticeBot research demonstrates this. Contract drafting assistance handles template filling and routine variations, with human lawyers reviewing final documents. Legal research support excels at finding relevant cases and organizing results. Document review for classification and privilege identification builds on proven machine learning applications. Matter budgeting benefits from case outcome predictions that provide probability distributions.
Essential guardrails matter. Qualified lawyers must review AI outputs before anyone relies on them. This is risk management, not an efficiency drag. Every case citation must be checked. Several lawyers have already faced sanctions for filing briefs with AI-generated fake citations.
Use AI for routine, well-defined matters first. Don't apply it to novel legal questions or cases requiring nuanced value judgments until the technology matures further. Lawyers need training in working effectively with AI assistants: what to trust, what to verify, how to recognize hallucination or overconfidence.
Moving Beyond Hype
Ashley's historical perspective offers an important corrective to both uncritical enthusiasm and reflexive skepticism. We're at an inflection point where genuinely new capabilities can be combined with accumulated knowledge from five decades of research to create more capable tools.
For legal teams, several principles emerge. Study the history to understand why previous legal AI efforts fell short. This helps avoid repeating mistakes. Focus on problems, not technologies. Don't ask "How can we use LLMs?" Ask "What legal process is expensive, repetitive, and high-volume?" Then consider whether LLMs could assist.
Expect hybrid solutions. Pure LLM approaches will likely be outperformed by systems combining neural flexibility with structured legal knowledge. When evaluating vendors, ask about their architecture. Build verification into workflows. The capability to verify AI outputs is as important as the AI's generative capability itself.
Participate in the learning curve. The technology is advancing rapidly. What failed six months ago might work now. What works now might be obviated soon. This requires ongoing engagement and adjustment.
Looking Ahead
Ashley closed with several open questions. Can prompting make implicit knowledge explicit enough? Or will we need to return to representing legal knowledge explicitly, just with LLMs helping to extract it? Will prompting generalize across legal domains? Could legal reasoning emerge as an emergent property of scale? Ashley is skeptical but admits: "In another few years, who knows?"
These aren't just academic questions. The answers will determine which legal AI investments pay off and which prove to be expensive dead ends.
We're past the point where legal teams can afford to ignore AI developments. The technology isn't perfect, but it's passed the threshold of usefulness for specific, well-scoped applications. The teams who will thrive are those who understand both capabilities and limitations, who can distinguish genuine progress from hype, who build appropriate guardrails, and who maintain realistic expectations.
As Ashley's five-decade perspective makes clear, legal AI has been a story of gradual progress punctuated by breakthrough moments. We're in one of those breakthrough moments now. But breakthroughs don't eliminate the need for human judgment, legal expertise, or careful implementation. They just give us more powerful tools to apply that judgment more efficiently and effectively.





