Type Systems and Static Analysis

What is the difference between a type system and a static analysis?

Technically, a type system is a particular kind of static analysis. In practice, people use the term “static analysis” to refer to those static analyses which are not type systems. So, what constitutes a type system?

In my view, a type system is decidable, composable, predictable static analysis.

This essay focuses mainly on the use of type systems and static analyses for correctness issues. The use of static analysis for optimization and performance improvement is unquestioned and well-established.

Decidable

Typechecking must be decidable. The user may be required to add annotations to a program, so long as there is a simple rule explaining where such annotations are required. Once annotated, the process of determining if the program is well-typed must be decidable.

Composable

It must be possible to check the type of an expression using only its immediate subexpressions, their types, and the list of their free variables.

This is a crucial precondition for the next requirement (predictability): it ensures that if you change a subroutine in a manner which does not alter its type, you will not break the well-typedness of any program which invokes it. It is the most basic form of “contract” between caller and callee.

Predictable

If type checking fails, the type checker must always be able to explain the failure and the reason for the failure to the user.

There is a great paper A Few Billion Lines of Code Later about how unpredictable static analyses cause all sorts of problems in “real world” use that you'd never encounter in a research environment. I believe that this phenomenon explains why unpredictable analyses are so popular in academia but so rarely used for correctness in real life (compared to type systems, which are ubiquitous).

Some great quotes:

Losing already inspected bugs is always a problem, even if you replace them with many more new errors. While users know in theory that the tool is “not a verifier,” it’s very different when the tool demonstrates this limitation, good and hard, by losing a few hundred known errors after an upgrade.

Users really want the same result from run to run. Even if they changed their code base. … Analysis changes can easily cause the set of defects found to shift. The new-speak term we use internally is “churn.” A big change from academia is that we spend considerable time and energy worrying about churn when modifying checkers. We try to cap churn at less than 5% per release. This goal means large classes of analysis tricks are disallowed since they cannot obviously guarantee minimal effect on the bugs found.

As the number of analysis steps increases, so, too, does the chance of analysis mistake, user confusion, or the perceived improbability of event sequence. No analysis equals no mistake. Further, explaining errors is often more difficult than finding them. … The heuristic we follow: Whenever a checker calls a complicated analysis subroutine, we have to explain what that routine did to the user, and the user will then have to (correctly) manually replicate that tricky thing in his/her head. … For many years we gave up on checkers that flagged concurrency errors; while finding such errors was not too difficult, explaining them to many users was.