Every legal definition is a compression algorithm applied to continuous reality. The leakage rate measures how fast the boundary fails.
A legal definition is a compression: it takes the continuous, high-dimensional space of real-world economic activity and maps it to a discrete category (taxable/non-taxable, qualified/unqualified, income/not-income). Like any compression, it loses information at the boundary. The “leakage rate” is the frequency at which real-world cases arrive at the boundary and the definition fails to classify them.
L(d, t) = cases_litigating_boundary(d, t) / total_cases(t) where d = definition, t = time period
The leakage rate is a function of:
This is where the Dimensionality Illusion paper connects directly.
A definition is a low-dimensional projection of high-dimensional reality:
When economic reality is low-dimensional (simple transactions, traditional businesses), the projection works. When reality is high-dimensional (crypto, derivatives, digital assets, international structures, gig economy), the projection FAILS — the definition cannot classify novel arrangements because they occupy dimensions the definition wasn’t designed to capture.
For each term: the definition, the leakage pattern, and the dimensionality explanation.
Definition: NONE. Used 339+ times in the IRC. Never defined by statute.
Leakage rate: maximal — every new economic arrangement requires fresh litigation.
Groetzinger v. Commissioner (1987): the Supreme Court could not define it. Said it must be determined case-by-case. The non-definition is an infinite-bandwidth channel — it can mean anything, therefore it means nothing until a court decides.
Key cases: Groetzinger (gambling), Whipple v. Commissioner (lending to corporations), Higgins v. Commissioner (investment management). Each case arrived at the boundary, and the boundary was not there.
Definition: “the price at which the property would change hands between a willing buyer and a willing seller, neither being under any compulsion to buy or to sell and both having reasonable knowledge of relevant facts”
38 words. Still litigated constantly because:
Dimensionality: the definition projects a multi-dimensional negotiation onto a single number. Every dimension it discards is a potential litigation vector.
Definition: not defined in the IRC. IRS guidelines suggest 90% of assets and 70% of operating assets.
But “substantially” is inherently vague — it maps a continuous variable (percentage) to a binary outcome (yes/no) without specifying the threshold.
This is a quantization problem: where do you put the boundary? 85%? 90%? 95%? Every boundary generates disputes at the margin.
Definition: not defined. Welch v. Helvering (1933): “life in all its fullness must supply the answer.”
This is the moral-lexeme problem (D8 from dimensions.html): “ordinary” could mean akushala (unskillful), schlecht (low quality), or kakos (base). English doesn’t distinguish.
Leakage: any expense that is plausibly “ordinary” in one context is arguably “extraordinary” in another. The definition can never be precise because the underlying concept is gradient, not binary.
The Howey Test (1946): “an investment of money in a common enterprise with a reasonable expectation of profits to be derived from the efforts of others”
This 4-factor test is a PCA-4 projection of economic reality.
It worked for stocks and bonds (low-dimensional financial instruments). It fails for crypto (NFTs, staking, yield farming, governance tokens) because crypto occupies dimensions the Howey test wasn’t designed to capture.
This is the canonical CISA failure: distinct economic arrangements that the 4-factor compression merges into a single “security” category.
More precise definitions leak less initially but more catastrophically when the boundary fails:
This is the bias-variance tradeoff from machine learning, applied to legal language:
Total_error = bias² + variance + irreducible_noise Legal analog: Total_leakage = vagueness² + boundary_failure + irreducible_ambiguity
Definitions are not permanent — they decay as economic reality evolves away from the drafter’s assumptions.
The half-life of a tax definition (estimated: 15–25 years before major litigation challenges the boundary) could be computed from case law data.
AI and NLP tools could potentially detect high-leakage definitions before they fail — flag definitions with low-dimensional projections applied to high-dimensional economic domains.