Wednesday, August 31, 2011

Behind the Scenes: Polymorphism and Function Resolution in C++

This post describes how function resolution is typically (but not always) implemented by compilers.

For non-virtual functions, the compiler matches a function call with the corresponding function definition at compile-time. This is known as static binding.

Dynamic Polymorphism

When a member function is declared virtual, it can be overridden in a class that directly, or indirectly, derives from the class where it was declared (the base class).

class Base
{
public:
    virtual ~Base() {}
    virtual void hello() = 0;     // hello is pure virtual
};

class A : public Base
{
public:
    void hello() { std::cout << "A says 'hello'\n"; }
};

class B : public Base
{
public:
    void hello() { std::cout << "B says 'hello'\n"; }
};

void sayHello(Base& helloer)
{
    helloer.hello();
}

int main()
{
    Base *a = new A;
    Base *b = new B;
    sayHello(*a);
    sayHello(*b);
    delete a;
    delete b;
}

This program generates the following output:

A says 'hello'
B says 'hello'

Even though helloer is a reference to the base class, which function that will actually be called depends on the type of object we pass to sayHello(). This means, of course, that the function call to hello() must be resolved at run-time. This is referred to as dynamic binding. Internally this is achieved using two constructs known as v-tables (virtual tables, or dispatch tables) and v-pointers.

A v-table is created for every class that: declares one or more virtual functions, or overrides a virtual function. (A)
Such class is called a polymorphic class.
Functions that override virtual functions from a parent class are adjusted in the v-table of the derived class. (B)
The function call is resolved at run-time, from the matching index in the v-table, pointed to by the actual object's v-pointer. (C)
Each object of a polymorphic class has its own v-pointer, but there is only one v-table per class.
The v-tables and v-pointers are added by the compiler, and not visible to the programmer.

class Fruit
{
public:
    Fruit()
        : m_color("fruit-colored"),
          m_peeled(false)
    {}

    virtual ~Fruit() {}
    virtual void eat();    
    virtual void peel();
    void setColor(const std::string &color) { m_color = color; }
    const std::string &color() const { return m_color; }

protected:
    std::string m_color;
    bool m_peeled;
};

void Fruit::eat()
{
    std::cout << "Now eating this " << m_color << " fruit!\n";
}

void Fruit::peel()
{
    if (!m_peeled)
         m_peeled = true;
}

class Orange : public Fruit
{
public:
     Orange()
         : Fruit()
     {
         m_color = "orange";
     }
     void eat();
};

void Orange::eat()
{
    if (m_peeled) {
        Fruit::eat();
    } else {
        std::cout << "Can't eat orange before peeling it!\n";
    }
}

int main()
{
    Fruit *fruit1 = new Fruit;
    Fruit *fruit2 = new Orange;
    Fruit *fruit3 = new Orange;

    fruit1->eat();       // Now eating this fruit-colored fruit!
    fruit2->eat();       // Can't eat orange before peeling it! 

    fruit2->peel();
    fruit2->eat();       // Now eating this orange fruit! 

    std::cout << fruit2->color();        // orange

    delete fruit1;
    delete fruit2;
    delete fruit3;

    return 0;
}

Run this example from codepad: http://codepad.org/QUyLRq6l

Friday, August 26, 2011

Lambda Calculus Cheat Sheet

Click here to download in PDF format

Tuesday, August 16, 2011

The Meaning and Usefulness of Closures

In the previous post, I talked about the Fixed Point Combinator in λ-Calculus, and mentioned that combinator is another word for a closed lambda term, i.e., an expression with no free variables:

λx.x, λxy.x, λxy.y, etc.

Take for example the term λxy.y, which is really shorthand for λx.λy.y:

FV(λx.λy.y) = FV(λy.y) − {x} = FV(y) − {y} − {x} = {y} − {y} − {x} = Ø.

The opposite of a combinator – any expression that uses free variables, is called a closure. Even though the meaning of the term is slightly different in different contexts, the idea is basically the same:

Examining the term λx.(f x), we have the free variables FV(λx.f x) …

= FV(f x) − {x} = FV(f) ∪ FV(x) − {x} = {f} ∪ {x} − {x} = {f, x} − {x}
= {f}.

In Haskell, the definition for a function, func:

func f = \x -> f x

… uses the variable f, bound outside of the abstraction. It is therefore a closure.

The Pattern of Recursion and Fixed Point Combinators

Recursion is an interesting, and important computation tool that enables us to formulate patterns of self-similarity and express infinity, using a finite set of elements. There is a strong relation between recursion and the principle of induction in mathematics. Imperative programming languages usually allow recursion in a program to take place by letting us define a function that calls itself. Here is a simple recursive function in C++:

#include <iostream.h>

void count(int n)
{
    std::cout << "Counting: " << n << "\n";
    if (n < 5) {
        count(n+1);
    }
}

int main(int argc, char **argv)
{
    count(1);
    return 0;
}

Run this example from codepad:
http://codepad.org/YmxQcTIX

In functional programming, recursion has a much more central position. Purely functional languages do not define any looping constructs, but instead rely entirely on recursion to achieve iterative algorithms. Many data types are defined recursively. In Haskell, recursion even allows us to define (conceptually) infinite data structures:

n = 5 : n
take 3 n                -- [5, 5, 5]

Codepad: http://codepad.org/sRbFkSH5

Here, we take the first three elements from an infinite list of fives. The list n is defined as a single element, 5, followed by a list – itself.

n is a 5, followed by n, which is a 5 followed by n, which is a 5 followed …

A Reminder About Implicit Conversion and Default Value Arguments

Implicit conversion is usually undesirable, and as a general rule, single-parameter constructors should be explicit unless there is a specific reason otherwise. Using the explicit keyword, we declare a constructor to be non-converting:

class Stuff
{
public:
    explicit Stuff(int n)
    {
    }

    // ...
}

Stuff stuff = 5;            
// error: conversion from ‘int’ to non-scalar type ‘Stuff’ requested

Stuff stuff(5);             // ok

But what if the constructor takes more than one argument, and default values are specified for the additional, or for all arguments? We can still call this constructor using one parameter only. Consider the following example:

class Stuff
{
public:
    Stuff(int x, int y = 0, int z = 1)
    {
    }

    // ....
}

Stuff stuff = 5;            // aargh!

The explicit keyword should be used here as well. So, the rule actually applies to:

single argument constructors, and
multiple argument constructors, if all OR all except one of the arguments have a default value.

Wednesday, July 27, 2011

Lambdas in C++0x, Part I

The new C++0x standard adds support for anonymous functions; so called lambda expressions. The concept of anonymous, or locally defined functions plays a central role in functional programming and has its origin in the Lambda Calculus (λ-Calculus); a theory formulated by Alonzo Church in 1936 (yes, that was before FORTRAN). The basic idea here is to treat functions as first class objects – functions can be stored in variables, passed as arguments, and be used as return values etc.

Lambda expressions are similar to functors (function objects), but with the added advantage of now being an integral part of the language. We can think of them as syntactic sugar for a functor. Just like function objects, lambdas are type-safe and can maintain state.

Using lambda expressions

The lambda term λx.3 represents the anonymous function x → 3, i.e., a function which takes a single input, x and (always) returns 3.

Using the new C++0x lambda syntax, this can be expressed as:

[](int x) { return 3; }

…where the initial square brackets ( [ ] ) are called the lambda introducer and tells the compiler that a lambda expression follows. Since x is irrelevant here, we may simply omit the parameter specification in this case. This leaves us with a very basic function that returns the value 3:

[] { return 3; }

Type Inference in C++: The Return of Auto

With the new C++ standard (C++0x), the auto keyword has returned to the language, like a phoenix rising from the ashes – with new and completely different meaning.

With this new auto, we can ask the compiler to deduce the type of a declared variable from the expression used to initialize it. This feature is known as type inference, and is found in many functional programming languages. Type inference (or type deduction) introduces some of the advantages of dynamic type systems, without incurring the performance penalty and various other negative side-effects associated with relaxed type safety, such as increased risk of run time failure.

A very basic, and not very useful example could look something like:

int x = 1;
auto y = x + 1;

The important thing to note here is that auto is a keyword and not a type. This is to say that, unlike in a dynamically typed language, we can’t assign a value of a different type once the variable has been declared. The type of y (int) must be known at compile time, we just leave to the compiler to figure out what it is.

The Syntactic Sugar

Wednesday, August 31, 2011

Behind the Scenes: Polymorphism and Function Resolution in C++

Dynamic Polymorphism

Friday, August 26, 2011

Lambda Calculus Cheat Sheet

Tuesday, August 16, 2011

The Meaning and Usefulness of Closures

Saturday, August 6, 2011

The Pattern of Recursion and Fixed Point Combinators

Saturday, July 30, 2011

A Reminder About Implicit Conversion and Default Value Arguments

Wednesday, July 27, 2011

Lambdas in C++0x, Part I

Using lambda expressions

Monday, July 25, 2011

Type Inference in C++: The Return of Auto

Blog Archive

About Me

Labels