Definition Leakage: How Fast Legal English Decays

Every legal definition is a compression algorithm applied to continuous reality. The leakage rate measures how fast the boundary fails.

1. The Concept

A legal definition is a compression: it takes the continuous, high-dimensional space of real-world economic activity and maps it to a discrete category (taxable/non-taxable, qualified/unqualified, income/not-income). Like any compression, it loses information at the boundary. The “leakage rate” is the frequency at which real-world cases arrive at the boundary and the definition fails to classify them.

Leakage Rate

L(d, t) = cases_litigating_boundary(d, t) / total_cases(t)

where d = definition, t = time period

2. The Dimensionality Connection

When economic reality is low-dimensional (simple transactions, traditional businesses), the projection works. When reality is high-dimensional (crypto, derivatives, digital assets, international structures, gig economy), the projection FAILS — the definition cannot classify novel arrangements because they occupy dimensions the definition wasn’t designed to capture.

The CISA Failure

This is EXACTLY the CISA failure from the Dimensionality Illusion paper: the compressed representation appears to cover the case, but the critical distinction lives in a dimension the compression discarded.

3. Case Studies — Leakage in Action

For each term: the definition, the leakage pattern, and the dimensionality explanation.

3a. “Trade or Business” — Infinite Leakage 339+ uses Never defined

Definition: NONE. Used 339+ times in the IRC. Never defined by statute.

Leakage rate: maximal — every new economic arrangement requires fresh litigation.

Groetzinger v. Commissioner (1987): the Supreme Court could not define it. Said it must be determined case-by-case. The non-definition is an infinite-bandwidth channel — it can mean anything, therefore it means nothing until a court decides.

Key cases: Groetzinger (gambling), Whipple v. Commissioner (lending to corporations), Higgins v. Commissioner (investment management). Each case arrived at the boundary, and the boundary was not there.

3b. “Fair Market Value” — 38 Words, Constant Leakage 38 words Constant litigation

Definition: “the price at which the property would change hands between a willing buyer and a willing seller, neither being under any compulsion to buy or to sell and both having reasonable knowledge of relevant facts”

38 words. Still litigated constantly because:

“Willing buyer” — what about distressed sales?
“Reasonable knowledge” — how much information is reasonable?
“No compulsion” — what about regulatory requirements to sell?

Dimensionality: the definition projects a multi-dimensional negotiation onto a single number. Every dimension it discards is a potential litigation vector.

3c. “Substantially All” — The Vagueness Leak Quantization problem

Definition: not defined in the IRC. IRS guidelines suggest 90% of assets and 70% of operating assets.

But “substantially” is inherently vague — it maps a continuous variable (percentage) to a binary outcome (yes/no) without specifying the threshold.

This is a quantization problem: where do you put the boundary? 85%? 90%? 95%? Every boundary generates disputes at the margin.

3d. “Ordinary and Necessary” — The Moral Leak Moral lexeme Case-by-case

Definition: not defined. Welch v. Helvering (1933): “life in all its fullness must supply the answer.”

This is the moral-lexeme problem (D8 from dimensions.html): “ordinary” could mean akushala (unskillful), schlecht (low quality), or kakos (base). English doesn’t distinguish.

Leakage: any expense that is plausibly “ordinary” in one context is arguably “extraordinary” in another. The definition can never be precise because the underlying concept is gradient, not binary.

3e. “Security” (Securities Act) — The Compression Failure Howey Test (1946) Crypto crisis

The Howey Test (1946): “an investment of money in a common enterprise with a reasonable expectation of profits to be derived from the efforts of others”

This 4-factor test is a PCA-4 projection of economic reality.

It worked for stocks and bonds (low-dimensional financial instruments). It fails for crypto (NFTs, staking, yield farming, governance tokens) because crypto occupies dimensions the Howey test wasn’t designed to capture.

This is the canonical CISA failure: distinct economic arrangements that the 4-factor compression merges into a single “security” category.

4. Leakage Over Time

5. The Compression-Leakage Tradeoff

More precise definitions leak less initially but more catastrophically when the boundary fails:

This is the bias-variance tradeoff from machine learning, applied to legal language:

Bias-Variance in Legal Definitions

Total_error  = bias² + variance + irreducible_noise
Legal analog: Total_leakage = vagueness² + boundary_failure + irreducible_ambiguity

The R_def Connection

The IRC’s growth from 0.5 to 2.0–3.0 R_def ratio is the legal system’s attempt to reduce bias (vagueness) by adding variance (precise definitions) — but as the ML community knows, this doesn’t reduce total error unless you also reduce noise (grammatical ambiguity of English).

6. Implications

Definitions are not permanent — they decay as economic reality evolves away from the drafter’s assumptions.

The half-life of a tax definition (estimated: 15–25 years before major litigation challenges the boundary) could be computed from case law data.

AI and NLP tools could potentially detect high-leakage definitions before they fail — flag definitions with low-dimensional projections applied to high-dimensional economic domains.

The detection problem: A definition is most dangerous when it appears to work. The Howey test classified securities correctly for 70 years. The failure was invisible until the dimensionality of financial instruments exploded. Proactive leakage detection would require measuring the dimensionality of the economic domain against the dimensionality of the definition — and flagging gaps before litigation reveals them.

Uncertainty

Leakage rate data would need an actual case law corpus to compute precisely. The figures in the chart are conceptual, not empirical.
The ML analogy (bias-variance tradeoff, PCA projection) is structural, not mathematical. Legal definitions do not literally perform PCA; the mapping is illustrative.
The Howey-crypto connection is the author’s analytical framing. Courts and regulators are actively disputing whether Howey applies to specific crypto instruments, and the legal outcome is undetermined.
The 15–25 year half-life estimate is a hypothesis, not a measurement. Computing it rigorously would require systematic analysis of case law filing dates against definition enactment dates.
The CISA failure analogy assumes that the Dimensionality Illusion framework applies to legal classification. This is a cross-domain transfer; the domains may differ in ways the analogy does not capture.