Definition Leakage: How Fast Legal English Decays

Every legal definition is a compression algorithm applied to continuous reality. The leakage rate measures how fast the boundary fails.

1. The Concept

A legal definition is a compression: it takes the continuous, high-dimensional space of real-world economic activity and maps it to a discrete category (taxable/non-taxable, qualified/unqualified, income/not-income). Like any compression, it loses information at the boundary. The “leakage rate” is the frequency at which real-world cases arrive at the boundary and the definition fails to classify them.

Leakage Rate
L(d, t) = cases_litigating_boundary(d, t) / total_cases(t)

where d = definition, t = time period

The leakage rate is a function of:

  1. Precision of the definition — more words = narrower boundary = less leakage… in theory
  2. Dimensionality of economic reality — more complex economy = more edge cases = more leakage
  3. Time since enactment — older definitions face novel situations their drafters didn’t anticipate

2. The Dimensionality Connection

This is where the Dimensionality Illusion paper connects directly.

A definition is a low-dimensional projection of high-dimensional reality:

When economic reality is low-dimensional (simple transactions, traditional businesses), the projection works. When reality is high-dimensional (crypto, derivatives, digital assets, international structures, gig economy), the projection FAILS — the definition cannot classify novel arrangements because they occupy dimensions the definition wasn’t designed to capture.

The CISA Failure
This is EXACTLY the CISA failure from the Dimensionality Illusion paper: the compressed representation appears to cover the case, but the critical distinction lives in a dimension the compression discarded.

3. Case Studies — Leakage in Action

For each term: the definition, the leakage pattern, and the dimensionality explanation.

3a. “Trade or Business” — Infinite Leakage 339+ uses Never defined

Definition: NONE. Used 339+ times in the IRC. Never defined by statute.

Leakage rate: maximal — every new economic arrangement requires fresh litigation.

Groetzinger v. Commissioner (1987): the Supreme Court could not define it. Said it must be determined case-by-case. The non-definition is an infinite-bandwidth channel — it can mean anything, therefore it means nothing until a court decides.

Key cases: Groetzinger (gambling), Whipple v. Commissioner (lending to corporations), Higgins v. Commissioner (investment management). Each case arrived at the boundary, and the boundary was not there.

3b. “Fair Market Value” — 38 Words, Constant Leakage 38 words Constant litigation

Definition: “the price at which the property would change hands between a willing buyer and a willing seller, neither being under any compulsion to buy or to sell and both having reasonable knowledge of relevant facts”

38 words. Still litigated constantly because:

Dimensionality: the definition projects a multi-dimensional negotiation onto a single number. Every dimension it discards is a potential litigation vector.

3c. “Substantially All” — The Vagueness Leak Quantization problem

Definition: not defined in the IRC. IRS guidelines suggest 90% of assets and 70% of operating assets.

But “substantially” is inherently vague — it maps a continuous variable (percentage) to a binary outcome (yes/no) without specifying the threshold.

This is a quantization problem: where do you put the boundary? 85%? 90%? 95%? Every boundary generates disputes at the margin.

3d. “Ordinary and Necessary” — The Moral Leak Moral lexeme Case-by-case

Definition: not defined. Welch v. Helvering (1933): “life in all its fullness must supply the answer.”

This is the moral-lexeme problem (D8 from dimensions.html): “ordinary” could mean akushala (unskillful), schlecht (low quality), or kakos (base). English doesn’t distinguish.

Leakage: any expense that is plausibly “ordinary” in one context is arguably “extraordinary” in another. The definition can never be precise because the underlying concept is gradient, not binary.

3e. “Security” (Securities Act) — The Compression Failure Howey Test (1946) Crypto crisis

The Howey Test (1946): “an investment of money in a common enterprise with a reasonable expectation of profits to be derived from the efforts of others”

This 4-factor test is a PCA-4 projection of economic reality.

It worked for stocks and bonds (low-dimensional financial instruments). It fails for crypto (NFTs, staking, yield farming, governance tokens) because crypto occupies dimensions the Howey test wasn’t designed to capture.

This is the canonical CISA failure: distinct economic arrangements that the 4-factor compression merges into a single “security” category.

4. Leakage Over Time

Estimated Leakage Rate by Definition Over Time
Conceptual chart. X-axis: time since definition enacted. Y-axis: estimated leakage rate (cases per year disputing the definition). “Trade or business” is flat at maximum — never defined. “Fair market value” shows steady moderate leakage. “Qualified plan” starts low, then increases as new plan designs emerge. “Security” under Howey stays low until 2017, then spikes with crypto. The curves show that leakage accelerates as economic dimensionality increases. The crypto era is a dimensionality explosion that broke multiple definitions simultaneously.

5. The Compression-Leakage Tradeoff

More precise definitions leak less initially but more catastrophically when the boundary fails:

This is the bias-variance tradeoff from machine learning, applied to legal language:

Bias-Variance in Legal Definitions
Total_error  = bias² + variance + irreducible_noise
Legal analog: Total_leakage = vagueness² + boundary_failure + irreducible_ambiguity
The R_def Connection
The IRC’s growth from 0.5 to 2.0–3.0 Rdef ratio is the legal system’s attempt to reduce bias (vagueness) by adding variance (precise definitions) — but as the ML community knows, this doesn’t reduce total error unless you also reduce noise (grammatical ambiguity of English).

6. Implications

Definitions are not permanent — they decay as economic reality evolves away from the drafter’s assumptions.

The half-life of a tax definition (estimated: 15–25 years before major litigation challenges the boundary) could be computed from case law data.

AI and NLP tools could potentially detect high-leakage definitions before they fail — flag definitions with low-dimensional projections applied to high-dimensional economic domains.

The detection problem: A definition is most dangerous when it appears to work. The Howey test classified securities correctly for 70 years. The failure was invisible until the dimensionality of financial instruments exploded. Proactive leakage detection would require measuring the dimensionality of the economic domain against the dimensionality of the definition — and flagging gaps before litigation reveals them.
Uncertainty