Thursday, October 6, 2011

Lambdas in C++0x, Part II

Let's recall the syntax of a lambda expression, from the first part of this post:
  [C](P) M -> R { body }(Q)
  C = capture specification
  P = parameter specification
  M = mutable specification
  R = return type specification
  Q = actual call parameters
Lambda expressions are locally defined functions, also known as anonymous functions. A minimal example could look something like this:
int main()
{
    auto f = []() { std::cout << "hello"; };
    f();
    return 0;
}
The lambda can also be invoked as it is defined, on-the-fly:
[]() { std::cout << "hello"; }();
Here is a lambda that takes an int as argument:
auto fn = [](int x) { std::cout << "x = " << x << "\n"; };
fn(123);
Side note: I tend to use the terms parameter and argument more or less interchangeably. To make things clear, we should, however, make a distinction between the two:
  • A parameter is a variable declared in the prototype or declaration of a function.
  • The argument is the value that is actually passed to the function.

A return value can be specified, using the trailing return type specification (→ R):
auto fn = [](int x) -> int { return x + 1; };
int n = fn(1);
std::cout << "->" << n << "\n";
The auto keyword saves us some work when the compiler is able to infer the type of an expression automatically. Alternatively, we could write:
std::tr1::function<int(int)> fn = [](int x) -> int { return x + 1; };
In this post, I will continue to examine the capture specification and mutable keyword.

Capture

In order to access variables declared outside of a lambda expression (referred to as free variables), these must first appear in the expression's capture specification, i.e., the part between the initial square brackets. The following will therefore not work:
int a = 1;
auto fn = []() { std::cout << a; };   // error: 'a' is not captured
We first need to capture a:
int a = 1;
auto fn = [a]() { std::cout << a; }
Variables can be captured either by value, or by reference. In the above example, a is captured by value. By default, variables captured by value are treated as read-only (more about this later).
int a = 1;
auto fn = [a]() { a += 1; std::cout << a; }

// error: assignment of data-member ... in read-only structure
To specify a variable to be captured by reference, the name is prepended with an ampersand (&), just like an ordinary lvalue reference:
int a = 1;
auto fn = [&a]() { a += 1; std::cout << a; };
fn();
std::cout << a; // a is now 2
The compiler does some basic pattern matching when reading the capture specification, so we can define simple rules:
  []       capture nothing
  [&]      capture all variables by reference
  [=]      capture all variables by value
  [=, &b]  capture all variables by value, except b
  [&, a]   capture everything by reference, except a
  [a, b]   capture a, and b by value
  ... etc

Mutable specification

As mentioned earlier, variables captured by value are treated as const. To ensure that copies are made and variables can be modified in the local scope of the lambda, the mutable keyword must be used:
int a = 1;
auto fn = [a]() mutable { a +=1; std::cout << a; }
fn()                   // 2
std::cout << a;        // 1
Note that the variable here is passed by value, hence the original a is not modified. This is, of course, no different from ordinary functions.

That's more or less it, as far as the C++0x lambda syntax is concerned. Equipped with this knowledge, we can start to take advantage of some of the programming techniques from functional languages, to write more "elegant" code.

Currying and partial function application

Currying can be described as the process of transforming a function that takes multiple arguments into one that takes a single argument, and returns another function. The term "currying" is, of course, a reference to the famous logician Haskell Curry.

The following is an example in Haskell (the language, not the logician):
let add x y = x + y
let increment = add 1
Even though it appears as if add takes two arguments, looking at its type:
:type add
shows us that,
add :: (Num a) => a -> a -> a
it is a function that takes an argument of type a and returns another function, which takes an argument of type a, having the return type a.

So (a → a → a) can actually be written (a → (a → a)). Examining the type of increment,
:type increment
increment :: Integer -> integer
reveals that it is a function, which takes an integer, and returns another integer. So, increment is a partially applied version of add.

Partial application is sometimes used in Javascript, here is the same add function:
function add(x, y) {
    return x + y;
}
Because of its relaxed type system and the dynamic nature of the Javascript language, we can define the following function:
function add(x, y) {
    if (arguments.length == 1) {
        return function(z) {
            return add(x, z);
        };
    }
    return x + y;
}
See my previous post about closures for a more in-depth discussion on this. The above solution offers a great deal of flexibility in how to use the add function:
a = add(4, 5);
alert(a);                    // 9

addToFour = add(4);

alert(addToFour);            // function(z) { return x + z; }
alert(addToFour(5));         // 9
This also works:
alert(add(4)(5));            // 9
In C++, we need to be a bit more strict. Here is a templated version of the same add function again:
template <typename T>
T add(T x, T y)
{
    return x + y;
}
We can overload this function to allow a single parameter:
template <typename T>
auto add(T x) -> std::tr1::function<T(T)>
{
    return [x](T y) {
        return add(x, y);
    };
}
Calling the function with two arguments results in no surprises:
float z = add(3, 4);
std::cout << z << "\n";
Calling the overloaded version returns a function, which in turn can be called to produce the final output value:
std::cout << add(1)(2) << "\n";            // 3
With the added ability to create anonymous functions in C++0x, partial application is also possible:
auto addToThree = add(3);

std::cout << addToThree(5) << "\n";        // 8
std::cout << addToThree(1) << "\n";        // 4
Here is the add function, instead defined inline, using a lambda expression,
auto add = [](float x, float y) {
    return x + y;
};
and the partially applied increment function:
auto increment = [add](float x) {
    return add(1, x);
};

std::cout << increment(3) << "\n";          // 4
Note that the variable add is captured in the expression. Here is another example, with a partially applied function passed as parameter to another function:
#include <tr1/functional>

// a function that expects as argument a function that takes
// one float as argument and returns another float

void yield(const std::tr1::function<float(float)> &fn)
{
    std::cout << "calling function with 3 : " << fn(3) << "\n";
};

int main(int argc, char **argv)
{
    // a local function that takes two floats as arguments

    auto add = [](float x, float y) {
        return x + y;
    };

    // Here, we pass a partially applied version of the add
    // function as parameter to yield 

    yield([add](float x) { return add(1, x); });

    return 0;
}

Closures

Closures are functions, together with a portion of the environment in which they are defined. The basic pattern looks something like this:
auto factory() -> std::tr1::function<void(void)>
{
    std::string a = "hello";
    return [a]() {
        std::cout << a;
    };
}

int main()
{
    auto fn = factory();
    fn();
}
What we are interested in here is the variable a. It is captured in the lambda and its lifetime extends beyond the point where the program exits the execution scope of factory.

Example: Map

To show how functional programming concepts can be used in C++0x, here is an implementation of the well-known map function. Map is a higher-order function that applies a given function to each element of a list, or similar container, returning a result of the same type.
template <typename T>
std::vector<T> map(const std::tr1::function<T(T)> &fn,
                   const std::vector<T> &array)
{
    std::vector<T> result;
    typename std::vector<T>::const_iterator i;
    for (i = array.cbegin(); i != array.cend(); ++i)
        result.push_back(fn(*i));
    return result;
};

int main(int argc, char **argv)
{
    std::vector<int> n;
    n.push_back(1);
    n.push_back(2);
    n.push_back(3);
    n.push_back(4);

    std::tr1::function<int(int)> fn = [](int x) { return x+1; };

    std::vector<int> n2 = map(fn, n);

    std::vector<int>::const_iterator i;
    for (i = n2.cbegin(); i != n2.cend(); ++i) {
        std::cout << *i << "\n";
    }

    return 0;
}
Here, we pass map a vector <1, 2, 3, 4> and a, locally defined, function that adds one to its input. As result, map returns a vector with the following values:
2
3
4
5
To combine this with a more conventional object-oriented approach, we can encapsulate the functionality of the map function in a class:
#include <tr1/functional>

template <typename T>
class Mapper
{
public:
    explicit Mapper(const std::vector<T> &array)
        : m_array(array)
    {
    }

    std::vector<T> map(const std::tr1::function<T(T)> &fn) const
    {
        std::vector<T> result;
        typename std::vector<T>::const_iterator i;
        for (i = m_array.cbegin(); i != m_array.cend(); ++i)
            result.push_back(fn(*i));
        return result;
    }

private:
    std::vector<T> m_array;
};

int main(int argc, char **argv)
{
    std::vector<int> n;
    n.push_back(1);
    n.push_back(2);
    n.push_back(3);
    n.push_back(4);

    Mapper<int> mapper(n);
    std::vector<int> m = mapper.map([](int x) { return x*3; });

    std::vector<int>::const_iterator i;
    for (i = m.cbegin(); i != m.cend(); ++i) {
        std::cout << *i << "\n";
    }   

    return 0;
}
This program generates:
3
6
9
12
To further improve readability and even efficiency of this program, we can take advantage of two other C++0x features; new and improved initialization lists and type inference:

C++0x initialization lists: C++0x allows constructors and other functions to take initializion lists as parameters. The following is an example of a vector being constructed using a sequence of integers:
std::vector<int> v = {1, 2, 3, 5};
Type inference: We already use the auto keyword to infer the type of anonymous functions. Here, it can also be used in the iterator assignment to save us some typing. This code,
std::vector<int>::const_iterator i;    
for (i = modified.cbegin(); i != modified.cend(); ++i)        
    std::cout << *i << "\n";
can be simplified to:
for (auto i = modified.cbegin(); i != modified.cend(); ++i)        
    std::cout << *i << "\n";
Here is a revised version of the program:
#include <tr1/functional>

template <typename T>
class Mapper
{
public:
    explicit Mapper(const std::vector<T> &array)
        : m_array(array)
    {
    }

    std::vector<T> map(const std::tr1::function<T(T)> &fn) const
    {
        std::vector<T> result;
        for (auto i = m_array.cbegin(); i != m_array.cend(); ++i)
            result.push_back(fn(*i));
        return result;
    }

private:
    std::vector<T> m_array;
};

int main()
{
    Mapper<int> mapper({1, 2, 3, 4});

    std::vector<int> m = mapper.map([](int x) { return x*3; });

    for (auto i = m.cbegin(); i != m.cend(); ++i) {
        std::cout << *i << "\n";
    }

    return 0;
}
The addition of functional programming features to C++ adds a whole new dimension to the language, and makes the term multi-paradigm more relevant than ever.

0 comments:

Post a Comment