Posted by: holdenlee | January 30, 2014

## Mathematical Compression

1 Information compression in mathematics

1.1 For personal learning

I strongly believe that the best way to learn large amounts of mathematics—besides the problem-solving component—is to find a way to compress mathematical knowledge. A good math student does not memorize all the proofs and formulas; rather she remembers

• what’s essential, and
• the methods for computing the rest.

An example is trig formulas. You can get away with just knowing a few of them and then deriving the rest. Derive half-angle from double-angle, and double-angle from sum identities (or directly), and if you forget the sum identities you can even draw a picture with two triangles (like these pictures I drew when preparing for a ESP class). There are other ways to remember them: complex numbers, linear algebra. Multiple representations and many paths to the same result mean a lot less has to be memorized. When we talk about being smart at math, we refer to the “being able to compute” bit, not the “memorizing” bit. [I’m not using “compute” in the sense of rote, i.e., what we think computers do today.]

One thing I’ve started last term is “index-carding” all the mathematics I learn—my goal is that I should be able to state/derive/explain any major result I’ve learned with only information on the index cards, and a reasonable amount of time. The index cards aren’t traditional in the sense of flash cards where I write down statements of theorems, but more like memory joggers. Oftentimes I turn statements into pictures. Here’s two examples.  1.2 For the mathematical community

The main response to the math.se post Why are mathematical proofs that rely on computers controversial? is that we don’t just want a proof that something is true–which a computer can spit out as a list of symbols–we want to understand deeply why it’s true, be able to give a simple explanation, and find links between different problems and areas of mathematics that in a way “reduce the intellectual diameter” of the field (to borrow a term from Trefethen, p. 135). (As I’ll explain though, I think the allergy to computers is an artifact of automated/interactive theorem provers still being primitive compared to human mathematicians, and not an absolute.)

One answer cites Gian-Carlo Rota who wrote that “eventually every mathematical problem is proved trivial. The quest for ultimate triviality is characteristic of the mathematical enterprise.” We want short, intuitive proofs where it seems things can’t be otherwise, which we can grok in our mind all at once.

In fact, one can say that the whole goal of science is to find the shortest description of the world! As Scott Aaronson writes (in Quantum computing since Democritus, p. 155)

In science, you can always cook up a theory to “explain” the data you’ve seen so far: just list all the data you’ve got, and call that your “theory”! The obvious problem here is overfitting. Since your theory doesn’t achieve any compression of the original data… there’s no reason to expect your theory to predict future data. In other words, your theory is worthless.

2 A mathematical version of Hutter’s thesis

Hutter’s thesis is that artificial intelligence is equivalent to information compression. Indeed, one piece of evidence that we are intelligent is the fact that we can look at the world and derive simple rules from it about how things work, and that we can answer a lot of commonsense questions by inferring, without storing all the answers in a lookup table. The idea of “finding the shortest consistent description” has been formalized by Solomonoff induction and a hypothetical AI, AIXI.

Hutter’s benchmark task for AI is to compress 100MB of Wikipedia text.

I propose another test: take a proof of a major mathematical result (or body of results), and see how much it can be compressed, so that it can be decompressed into something that is verifiable by an interactive theorem prover such as Isabelle (which is built upon basic logic, although in a more readable fashion), line by line. Again, the measure would be a function of space and time (as any mathematical result can take no space to write out if you simply write a program that prints every possible mathematical sentence… i.e. compression is achieved at the expense of computation). There could be two versions of the test: one to compress a fixed mathematical result, and the other is to write a program to compress an arbitrary formalized mathematical result with proof.

The mathematical result should be difficult enough (say, of undergraduate difficulty) that it would be worth the space to develop some underlying theory, rather than beelining for the result, though not so difficult that it would take years to actually formalize.

• Why mathematics? I think that the current formulation of Hutter’s thesis depends too much on things that have little to do with AI. For instance, inconsistent use of synonyms, different ways of phrasing the same idea, inconsistent document structure, etc. could all add overhead to compression of a Wikipedia article. A more reasonable benchmark would be that any question asked about the 100MB of Wikipedia articles could be answered by the machine correctly—and as long as the answer is correct, the phrasing is irrelevant. I believe this allows more compression, and allows more of a tradeoff between compression and computation. Indeed, a person queried on his knowledge would not spout it out in a Wikipedia-like fashion, but rather may phrase it differently each time, but still convince someone that he is correct (cf. probabilistic checkable proofs). Of course, such a test is much harder to implement in practice (what constitutes a reasonable question and answer? Another kind of Turing test?), but a place where it can be implemented is in mathematics, where questions and answers can be made very precise. I believe that if we understand how to automatically compress mathematics better, it will shed more light on reasoning in less precise realms—because the process of mathematical discovery is nonrigorous (see Trefethen, p. 286).
• Why compression? Automated theorem proving is a subject in its own right, but because of the degree of automation, many of the theorems that can be proved by it are basic by mathematical standards (or require some kind of semi-brute-force search that computers are good at). Allowing starting material—i.e. a compressed file—may lead to additional insights on the kind of reasoning used in mathematical proofs that wouldn’t be as easily seen at a lower level. In a way, this proceeds in parallel to automated theorem proving, and I could see both kinds of research feeding into each other and into AI.

A good compressor (at least in the asymptotic limit) would have to have some of the intelligence that human mathematicians have:

• the ability to think of proofs nonlinearly, but to plan.
• the ability to take techniques that appear in different proofs, abstractify the techniques, and then call the abstractified technique. (Isabelle does this, for instance, with induction, but I am not aware that it has been done for more complicated techniques.) This is hard because many techniques are very vague, and one can’t make exact matches.
• etc.

Note that while compressed text files are not readable by people, for math this could potentially be different—at some intermediate stage of compression the proof should actually resemble the way that a person would describe the proof (obviously with different syntax). It would be nice for compressor-decompressor to output what it’s doing in understandable language—like in SHRDLU (the decompressor would be essentially “planning” the written-out proof from the compressed version).

Late remark: This appears to be a field of study–http://en.wikipedia.org/wiki/Proof_compression–though possibly a niche one.