Horseshoes, Hand Grenades and AI

Close enough.

That old saying about horseshoes and hand grenades exists because precision matters in some contexts and proximity matters in others. A horseshoe that lands within a foot of the stake still scores points. A hand grenade within a few feet of its target still accomplishes its purpose. But your bank balance? Your medical diagnosis? The load-bearing calculation on a bridge? In those moments, "close enough" isn't a feature. It's a catastrophic failure waiting to happen.

Which brings us to the elephant in every conference room: the growing tendency to treat large language models as a PhD in your pocket.

The Confidence Problem

AI systems speak with remarkable fluency. They never hesitate. They don't pepper their responses with "um" or "I think maybe." They deliver answers wrapped in the same authoritative tone whether they're explaining photosynthesis or inventing a court case that never existed.

A 2024 Stanford study asked various language models about legal precedents. The models collectively invented over 120 non-existent court cases, complete with convincingly realistic names like "Thompson v. Western Medical Center (2019)" and detailed but entirely fabricated legal reasoning. The models didn't flag uncertainty. They didn't say "I'm not sure about this citation." They stated fiction as fact with the same confidence they use for everything else.

By 2025, judges worldwide issued hundreds of decisions addressing AI hallucinations in legal filings. The pattern repeats across industries. In enterprise settings, 47% of AI users admitted to making at least one major business decision based on content that turned out to be hallucinated. Not a typo or minor error. Complete fabrication presented as authoritative truth.

The Fundamental Question You Need to Ask

Before you hand any task to an AI system, you need to answer one question: Do I need a deterministic outcome or a probabilistic one?

This isn't a technical distinction for engineers. This is the core framework for understanding when AI helps and when it introduces risk.

When you ask a calculator what 1+1 equals, you get 2. Not "probably 2." Not "based on patterns in mathematical operations, the most likely answer is 2." You get a calculated result derived from a deterministic process that will produce the same answer every single time without exception.

When you ask an AI what 1+1 equals, you get 2 because the model has learned from enormous volumes of text that this pattern typically resolves to that output. It's not calculating. It's pattern matching. For basic arithmetic, the distinction rarely matters because the patterns are so overwhelming that the model essentially always produces correct responses.

But the mechanism matters. A calculator knows 1+1=2. An AI statistically predicts that 2 is the most probable token to follow "1+1=". The difference between knowing and predicting defines the boundary between deterministic and probabilistic systems.

Where Probability Becomes Problematic

Even the best current models hallucinate around 0.7% to 1.5% of the time on straightforward summarization tasks. That sounds small until you realize it means roughly one error for every 100 interactions. For complex reasoning and specialized domains, rates climb dramatically. Legal information shows hallucination rates around 6.4% even among top-performing models. Medical queries? Scientific reasoning? The numbers get worse.

Recent research provides mathematical proof that hallucinations remain inevitable under current architectures. Large language models cannot learn all possible computable functions due to fundamental computational limitations. This isn't a training problem that more data will solve. The architecture itself creates structural incentives to produce fluent output even when the model has insufficient signal to determine truth.

Here's the uncomfortable reality: these systems generate statistically probable responses based on training patterns rather than retrieving verified facts. When the training data contains strong enough patterns, probability and truth align. When patterns are weak, sparse, or absent, the model still produces confident output because confidence and accuracy operate independently in these systems.

The Deterministic Test

Any time you're about to use AI for a task, run it through what I call the deterministic test. Ask yourself: If I run this same query 100 times, do I need identical answers every time?

If yes, you probably need a deterministic system. Traditional software. Databases. Calculators. Rule engines. Systems designed to produce consistent, reproducible results through explicit logic rather than learned patterns.

If an insurance company needs to calculate premium adjustments, that's deterministic. The formula should produce the same result regardless of when you run it or how you phrase the question. Using AI to "calculate" premiums introduces variability where none should exist.

If a legal team needs to verify that a citation exists, that's deterministic. Either the case was decided or it wasn't. There's no "probably" in court records.

If a financial system needs to sum transaction values, that's deterministic. Numbers add the same way every time. Pattern matching has no place in reconciliation.

Where Probability Shines

Not every task requires deterministic precision. Many valuable applications work beautifully with probabilistic systems because exactness isn't the point.

Brainstorming and ideation. You want an AI to suggest marketing angles for a product launch? The value isn't in getting the "correct" angle. There is no correct angle. You want creative options to consider. Probability-based generation excels here precisely because it can surface unexpected combinations.

First-draft content. Writing that will be reviewed, refined, and edited benefits from AI assistance. The draft doesn't need to be perfect. It needs to provide a starting point. Human review catches errors. Human judgment shapes the final product.

Research summarization with verification. AI can surface relevant information and synthesize across sources. When you verify the underlying claims before acting on them, hallucination risks become manageable. The AI accelerates discovery. You confirm accuracy.

Classification and routing. Sorting customer support tickets by topic? Tagging content by category? Minor errors in classification rarely cause catastrophic outcomes, and the efficiency gains justify accepting some probability-based uncertainty.

Translation and communication assistance. Perfect word-for-word accuracy matters less than conveying meaning effectively. Probabilistic language models handle nuance and context in ways that rule-based translation never could.

The pattern: probabilistic systems work well when the cost of occasional errors remains low, when human oversight provides verification, or when the task itself has no single correct answer.

The Verification Burden

Here's where most organizations stumble. They want AI efficiency without accepting the verification burden that probabilistic outputs require.

If you're willing to verify AI outputs before acting on them, the hallucination rates matter less. Verification catches errors. The AI accelerates the work. The human ensures accuracy. This is the "human at the helm" approach that makes AI genuinely useful.

If you're not willing to verify, you need to honestly assess whether the task tolerates error rates in the 1% to 10% range. Some tasks do. Many don't. Pretending otherwise doesn't reduce risk. It hides risk until something breaks publicly.

Research shows that asking models "Are you hallucinating right now?" reduces subsequent hallucination rates by about 17%. Simple prompting strategies help. Retrieval-augmented approaches that ground responses in source documents can reduce hallucinations by 40% to 71%. But reduction isn't elimination. The fundamental probabilistic nature remains.

Matching the Tool to the Task

The most intelligent use of AI doesn't treat it as a universal solution. It treats AI as a powerful tool with specific strengths and inherent limitations.

Use deterministic systems for: financial calculations, compliance verification, legal citations, medical dosing, identity verification, access control, audit trails, and any outcome where "close enough" creates liability.

Use probabilistic systems for: creative generation, draft content, research acceleration, classification tasks, sentiment analysis, summarization with verification, brainstorming, and any outcome where human review or error tolerance makes variability acceptable.

The categories aren't always clean. Many workflows benefit from combining both. An AI might draft contract language that a legal team then reviews with deterministic verification tools. The AI accelerates creation. Traditional systems ensure compliance. Neither alone handles the full workflow optimally.

Beyond Blind Trust

The PhD-in-your-pocket framing does everyone a disservice. It sets expectations for authoritative correctness that probabilistic systems cannot reliably deliver. It encourages skipping verification because surely something this confident must be accurate.

AI systems are more artificial than intelligent in a precise technical sense. They simulate intelligent output through pattern recognition without the underlying verification processes that actual expertise requires. A PhD researcher knows when their knowledge has gaps. They cite sources. They express uncertainty proportional to their confidence. Language models produce uniform confidence regardless of their actual reliability on a given query.

This doesn't make AI useless. Far from it. But treating probabilistic systems as deterministic oracles creates risk that compounds over time. Every unverified output that happens to be correct reinforces the behavior of not verifying. Until something important breaks.

The old saying names horseshoes and hand grenades for a reason. Most things in life aren't like horseshoes and hand grenades. Most things require more precision than "close enough."

AI gives you horseshoes-and-hand-grenades accuracy with calculator-level confidence. Knowing the difference determines whether it helps you or quietly undermines everything downstream from its output.

—

W.S. Benks is AI Systems Architect and Automation Research Lead at HT Blue, where he designs agentic frameworks that connect people, data, and intelligent processes while keeping humans firmly at the helm.