Laziness in Pure Functional Languages

Some collected comments on the advantages/disadvantages of laziness, assuming that you've already committed yourself to a language which is free from side effects.

I have three major beefs with laziness: it's not reflected in the types, it leads to space leaks far too often, and it makes it impossible to get stack traces (which I think are more important than debuggers). I also have two minor beefs with it.

Not Reflected in Types

For all of the Haskell community's advocacy of types, I find it amazing that Haskell functions' types do not express the strictness of the function! This is a glaring, gigantic oversight!

Strict languages like ML don't have this problem – every function is strict, so every type implicitly says “and is strict in all arguments”. Haskell can't say “and is lazy in all arguments” because you need to use strict primitives to build nontrivial programs. Total languages like Coq don't have this problem either because their programs' behavior is not influenced by choice of evaluation strategy.

Space Leaks

A contemptible observation is that sufficiently large programs seem only to work with added strictness annotations. It might be a reason why monads are so popular at the moment since they often 'linearize' the computation, resulting in more predictable behavior.
marco, in this comment

Here is some advice on circumventing the disadvantages of laziness in Haskell. This focuses mainly on the unpredictability of space complexity

Stack Traces

Note: I use the phrase “stack trace” below in a loose manner, meaning basically “explanation of what computations led up to the error”. For imperative languages this is a stack trace, but I could be convinced that some similar-yet-not-identical notion is appropriate for functional languages. So “stack trace” below doesn't necessarily literally mean a silicon-level call stack of addresses.

The other major disadvantage to lazy-by-default (as I see it) is the lack of anything like a “stack trace” to include with runtime errors. If I had to choose between living life without a debugger and living life with a runtime that didn't give me stack traces, I'd choose the former (I hardly ever use a debugger anyways). Stack traces are just way too useful, both to developers and for collecting bug reports. This is one place where Java has really advanced the state of the art in computing – it popularized the idea that the runtime should be able to give you high-quality stack traces without having to resort to a debugger.

Incidentally, inability to provide stack traces is often cited as a reason for not supporting tail-calls. In my opinion this isn't a big enough problem: you can simply put a marker on the stack every time a tail call is made, and delete contiguous sequences of markers whenever the stack exceeds some fixed limit. Deleted sequences should be replaced with a marker indicating that a deletion was performed. The resulting debugging annotations will be asymptotically bounded by (and in practice much smaller than) the program's true space requirements.

Current Stack Trace Technology for Haskell

This seems like a good place to summarize the current state of the art in Haskell pseudo-stack-traces.

GHC has a built-in feature, :trace, that will give you the last 50 execution steps. This often means less than 50 frames of the “call stack”. This is an “accurate but bounded” stack.

There is a tool called Hat which streams out an execution record of the program to the disk and then reconstructs an accurate stack trace from it. Needless to say, the space requirements are gigantic, it impacts performance, and may cause Heisenbugs. It cannot be left “always on” like the JVM's stack traces. Finally, Hat works by source-to-source translation, so it often cannot cope with recently-added language features (this is not an inherent difficulty with the Hat approach, however; it is merely an artifact of the way it is implemented).

The paper  Finding the needle: stack traces for GHC described a tool that maintains an abstracted stack whose size is bounded by the size of the programs' text. Unfortunately when it enters a function it elides any call frame involving that function further down the stack. Imagine an execution sequence that arrives at this call stack (bottom to top):

foo999
...
foo1
foo0
foo999
...
foo1
foo0
main

The “abstractor” will elide the bottom 1,000 calls to foo0-foo999. This is an essential feature of the way that the strategy in their paper avoids an asymptotic space explosion.

Now imagine that the top 1,000 calls return, and then the original call to foo999 throws an exception or otherwise wishes to examine the stack. The elided stack will contain nothing but main!

So, there are completely reasonable situations in which the abstracted stack contains no useful information at all. This is unfortunate.

Cost Model

I didn't notice this (at least on a conscious level) until reading this post by Bob Harper, though what follows is my own (possibly inaccurate) elaboration of his brief comment.

Haskell really has no language-based cost model, because pattern-matching – that is, case discrimination – may trigger an arbitrarily complicated computation. Since the operational part of the language boils down to lambda, application, and case, this means that there is no cost model at all: lambdas cost nothing, and the cost of application and case can be anything at all.

Minor: Full Abstraction / Concurrency

Laziness breaks full abstraction, and adding in the necessary operator to restore full abstraction forces you into accepting a sort of crippled concurrency. People often forget that lazy PCF isn't fully abstract without an interleaving operator. This operator is either missing or else a hacky second-class citizen in all implementations.

Lazy PCF can in fact be used as a semantics for concurrency (for example, Kahn process networks use basically the same non-flat denotational domain as lazy PCF), but the more I learn about this the less I think that it's a complete semantics for concurrency. If you've decided that you want to have concurrency, you probably want operators like “fair merge” which have no denotational semantics in the lazy PCF domain. So I think that “models concurrency well” is a terrible argument for laziness.

Minor: Strict Constructors, Non-Strict Constructors, and newtype

Haskell needs to distinguish between the following three definitions, which are non-equivalent under lazy evaluation. Strict evaluation does not have this sort of complication.

data    D1 = D1 Int
data    D2 = D2 !Int
newtype N  = N Int

This example comes from the Haskell 98 Report. Basically, the strict/lazy pattern distinction is required in order to control whether or not D1 has an additional member D1 bottom, whereas D2 bottom is indistinguishable from bottom::Int. The newtype definition for N is semantically equivalent to D2: each type is isomorphic to Int, whereas D1 contains an extra element (D1 bottom).

However, as noted here, although the types N and D2 are isomorphic, Haskell interprets pattern matching on these two types differently. That is, the first two functions below will evaluate to bottom when given a diverging argument, but the last one will diverge:

foo1 (       N n) = 3       -- foo1 bottom ~~> 3
foo2 (     d::D2) = 3       -- foo2 bottom ~~> 3
foo3 ((D2 d)::D2) = 3       -- foo3 bottom ~~> bottom

Similar problems crop up with lazy (“irrefutable”) patterns. So-called “irrefutable patterns” are really just syntax for a type declaration – they're not actually patterns. For example, these two definitions are equivalent:

foo ~(x,y)            = 3
foo  (p :: (Int,Int)) = 3

… because either one can be applied to bottom and still yield a result. So there isn't really a pattern match going on.

Summary: a newtype is essentially a one-constructor data declaration with one field marked as strict (!) for which all pattern-matches are automatically treated as irrefutable.

I understand why all of this is necessary, but it's just way too complicated for the negligible amount of power it adds.