Tuesday, August 16, 2011

The Meaning and Usefulness of Closures

In the previous post, I talked about the Fixed Point Combinator in λ-Calculus, and mentioned that combinator is another word for a closed lambda term, i.e., an expression with no free variables:

λx.x, λxy.x, λxy.y, etc.

Take for example the term λxy.y, which is really shorthand for λx.λy.y:

FV(λx.λy.y)   =   FV(λy.y) − {x}   =   FV(y) − {y} − {x}   =   {y} − {y} − {x}   =   Ø.

The opposite of a combinator – any expression that uses free variables, is called a closure. Even though the meaning of the term is slightly different in different contexts, the idea is basically the same:

Examining the term λx.(f x), we have the free variables FV(λx.f x) …

= FV(f x) − {x} = FV(f) ∪ FV(x) − {x} = {f} ∪ {x} − {x} = {f, x} − {x}
= {f}.

In Haskell, the definition for a function, func:
func f = \x -> f x
… uses the variable f, bound outside of the abstraction. It is therefore a closure.

Closures are commonly associated with functional programming, but most languages that support higher-orderism also allow locally defined functions that "closes over" variables from the scope in which they are defined. Here is an example in Javascript:
function factory() {
    var v = 'hello';
    return function() {
        alert(v);
    };
}

function yield(f) { f(); }

var a = factory();
yield(a);

Here, the variable a is used to store the function returned by factory. We can examine its contents by adding the line:
var a = factory();
alert(a);
This will display:
function() {
    alert(v);
};

But, not all anonymous functions are closures. The element of interest here is the variable v. Shouldn't the duration of v end when factory returns? Well, it doesn't – v is still defined within the scope of this anonymous function that we later pass on as argument to yield, which in turn calls it. It is as if the variable v had become part of the function. To confirm that v is not in the global scope, we can add this line:
var a = factory();
alert(a());                  // hello
alert(v);                    // undefined
It is not visible from outside of the function.

Say hello to the closure

So, a closure here is a function, together with a portion of the environment in which it was created. The free variables are the non-local names, in this case the variable v, whose lifetime has now extended to (at least) that of the function itself.

Note that, the closure is formed during the stage of execution where the inner function is returned from factory, a function literal itself does not constitute a closure.

Here is a similar example of a C++ class, using the new C++0x lambda syntax:
#include <tr1/functional>
#include <string>
#include <iostream>

class MightyLambda
{
public:
    MightyLambda()
    {
        std::tr1::function<void(void)> fn = factory();
        yield(fn);
    }

    ~MightyLambda() {}

protected:
    template <typename F> void yield(const F &f) { f(); }

    std::tr1::function<void(void)> factory() const
    {
        std::string str = "hello!";
        auto f = [str]() {
            std::cout << str;
        };
        return f;
    }
};
The result is the same as in the Javascript example above; factory returns an anonymous function. We then pass that function to yield, which in turn invokes the function.

In C++ we need to explicitly declare the variables that should be made available from within the lambda. This happens in the initial square brackets of the expression, and is called a capture specification. It is subject to a number of rules which I won't go into here.

Common use

In Javascript, closures are often used to:
  • Make data and/or functions private (encapsulation):
obj = (function() {
   var v;            // private

   return { setValue: function(val) { v = val; },
            value:    function() { return v; }}
});

o = obj();
o.setValue(123);
alert(o.value());
  • Bind values to callbacks or functions that execute after a delay:
function callLater(f) {
  window.setTimeout(f, 1000);
}

for (i = 1; i <= 3; ++i) {
  callLater(function() { alert(i) });     // displays 3, 3, 3!
}
This will display the numbers 3, 3, and 3, which is usually not what we want. If we instead pass a closure to our timeout function:
function callLater(f) {
  window.setTimeout(f, 1000);
}

function callback(v) {
  return function() {
    alert(v);
  }
}

for (i = 1; i <= 3; ++i) {
  callLater(callback(i));         // displays 1, 2, 3
}
…the result is what we expected, namely for the alert to display the value of the variable at the time of initiating the timeout.
  • Create curried functions; take for example, this function:
function multiply(x, y) {
   return x * y;
}
Now, to make it more interesting we can allow for only one argument to be specified:
function multiply(x, y) {
   if (arguments.length == 1) {
      return function(z) { return x * z };
   }
   return x * y;
}

double = multiply(2);         // double is a "curried" function
alert(double(4));
By applying only on of the arguments to multiply, we create a curried version of the function, that returns any value times 2. Did you spot the closure?
return function(z) { return x * z };
Here, x is the free variable.

A much more detailed explanation of closures in Javascript is available here: http://jibbering.com/faq/notes/closures/

0 comments:

Post a Comment