Wednesday, November 9, 2011

Reinventing the Wheel, Part I: String Reverse Algorithm

A string reverse algorithm takes a string of characters, and produces a string with the characters in reverse order.


In doing this, we could
  • use a more high-level C++ approach,
std::string reverse(const std::string &str);

// ...

std::string reversed = reverse("uncategorized");
printf(reversed);                                 // dezirogetacnu
where the function returns a reversed copy of the original string. (Btw, to reverse a std::string, use the std::reverse algorithm, see bottom of this post.)

Alternatively, we can
  • write a C-styled function, using a "raw" pointer and then modify the string, in-place:
void reverseInPlace(char *str);

// ...

char s[] = "november";
reverseInPlace(s);
printf(s);                                // rebmevon
Since we are going for minimal memory footprint and optimal performance, we will pursue the latter approach.

Saturday, November 5, 2011

Non-Public and Virtual Inheritance in C++

In addition to public inheritance, C++ also supports three more rarely used forms of derivation; private, protected and virtual.

Public inheritance

class A {};
class B : public A {};
Public inheritance models an is-a relationship (e.g., Employee is a Person). Class members can be inherited or redefined in the derived class. Dynamic polymorphism (run-time binding) is achieved through the use of virtual member functions.
class A
{
public:
    virtual void stuff() { std::cout << "A::stuff()" << std::endl; }
};

class B : public A
{
public:
    void stuff() { std::cout << "B::stuff()" << std::endl; }
};

// ...

A* x = new B;
x->stuff();        // B::stuff()
The scope resolution operator (::) can be used to qualify hidden names:
struct A { int x; };
struct B : public A { int x; };   // A::x is now hidden
                                  // in the scope of B
// ...
B b;
b.x = 1;            // refers to B::x
b.A::x = 2;         // uses the qualified name

Composition

Many times, inheritance introduces strong coupling, and a general rule of OO design is that you should prefer composition over inheritance. Composition, or containment, expresses the type of relationship where one object of a specific type contains an object of another type. This is also known as a has-a relationship.
class A
{
public:
    void doStuff() { std::cout << "doing stuff!" << std::endl; }
};

class B
{
private:
    A m_a;              // B has an A
};

Friendship

In C++, it is also possible to declare classes friends of each other. Doing so gives the "befriended" class access to private and protected members of the class in which the friend keyword appears.
class B;     // forward declaration
class A
{
private:
    void doStuff() { std::cout << "doing stuff!" << std::endl; }

    friend class B;
};

class B
{
public:
    void doStuffWithA()
    {
        m_a.doStuff();         // Possible due to friendship
    }

private:
    A m_a;
};
Friendship is a powerful feature that should be used sparingly, and only in cases where classes are closely related. Now, let's take a look at private and protected inheritance, and how they relate to containment.

Private inheritance

class A {};
class B : private A {};
Private inheritance results in all public functions of a class being inherited as private.
class A
{
public:
    void doStuff() { std::cout << "doing stuff!" << std::endl; }
};

class B : private A {};

// ...

B b;
b.doStuff();         // error: ‘void A::doStuff()’ is inaccessible
As it turns out, this is very similar to composition, and the has-a kind of relationship described earlier.

Has-a vs. is-implemented-in-terms-of

Composition can actually signify two different types of relationship, namely
  • has-a, and
  • is-implemented-in-terms-of.

Scott Meyers explains the difference in this interview by Bill Venners. It goes something like this: In software design, and object oriented programming in particular, we are dealing with two separate domains. Objects can be described as either
  • real world objects, or
  • implementation artifacts.

The former is referred to as the application domain and the latter as implementation domain. Classes in the application domain would be, for example, User, Calendar, Document, and Brush. The implementation domain could include Mutex, List, Array, Buffer, SmartPointer etc. Implementation artifacts are abstractions that only make sense in the context of the program itself. A has-a relationship means the type of composition that corresponds to the application domain, whereas is-implemented-in-terms-of relates more to the implementation. Perhaps an Employee has a Role,
class Role
{
    // ...
};

class Employee
{
private:
    Role m_role;
};
but Staff is implemented in terms of a List (of Employees).
class Staff : private std::list<Employee>
{
    // ...
};
The distinction is not 100% obvious to me, but … nevertheless, it should be clear that the reason for not using public inheritance here is to avoid the interface of std::list being exposed through our Staff class.

Private inheritance is similar to containment, but in the case of containment, the containing class operates only on the public interface of the contained object. Using private inheritance, the derived class can also access protected members of the base class.

Other differences between (simple) composition and private inheritance are, that:
  • Using composition, a class can contain more than one object.
  • Using private inheritance, the derived class can override the base class' virtual functions.

Composition gives us control over how, and to what extent, the public interface of a contained object is exposed:
class A
{
public:
    void doStuff() {}
};

class B
{
public:
    void doStuff()
    {
        m_a.doStuff();
    }

private:
    A m_a;
};
Private inheritance makes this even more straightforward. The interface of the derived class is inaccessible from the outside world, however, a using directive can relax the terms of this contract:
class B : private A
{
public:
    using A::doStuff;
};
When should we use private inheritance then? The C++ FAQ Lite says:
Use composition when you can, private inheritance when you have to.

Protected inheritance

class A {};
class B : protected A {};
Protected inheritance is similar to private, with the difference that the base class' public functions now are being inherited as protected instead of private.
class A
{
public:
    void hello() {}
};

class B : private A {};

class C : public B
{
public:
    C() { hello(); }  // error: ‘void A::hello()’ is inaccessible
};
Using private inheritance for class B, class C, derived from B, does not have access to A's public members, hence the error:
error: ‘void A::hello()’ is inaccessible
On the other hand, if we change the declaration of B to,
class B : protected A {};
the program will compile.

Virtual inheritance

Multiple inheritance is a source of much confusion, and usually best avoided. Virtual inheritance exists to address a problem called the Diamond Problem. This is a situation that arises due to multiple inheritance, when the class structure becomes ambiguous as a result of a class being inherited more than once:
class Base
{
public:
     virtual void doStuff() {}
};

class A : public Base {};
class B : public Base {};

class C : public A, public B {};

// ...

C c;
c.doStuff();
This produces the following error:
request for member ‘doStuff’ is ambiguous
The problem comes from the fact that both A and B inherit Base, hence doStuff() could refer to both A::doStuff() and B::doStuff(). Looking at the class hierarchy, the diamond shape appears.


This ambiguity can be resolved. One way would be to simply qualify all calls.
c.A::doStuff();
This, however, is usually not the best solution. Instead, we can make Base a virtual base class to A and B, and therefore behave as if it was a direct base class of C.
class Base
{
public:
     virtual void doStuff() {}
};

class A : virtual public Base {};
class B : virtual public Base {};

class C : public A, public B {};

// ...

C c;
c.doStuff();         // ok!
A derived class can have both virtual and non-virtual base classes.

See this article for a more in-depth discussion: http://www.cprogramming.com/tutorial/virtual_inheritance.html

Thursday, November 3, 2011

The Reference-to-Pointer Syntax in C++

The syntax of pointer declarations is sometimes difficult to decipher, just consider the following three, elementary cases:
  • int *p[2]
  • int (*p)[2]
  • int (*p)()

 

Example 1

int *p[2];
This is actually an array of two int pointers. We can also think of this as a pointer to two consecutive int pointers.

int *p[2];

int *a = new int;
*a = 21;
int *b = new int;
*b = 22;

p[0] = a;
p[1] = b;

std::cout << *p[0] << std::endl;          // 21
std::cout << *p[1] << std::endl;          // 22

int **q = p;

std::cout << **q << std::endl;
std::cout << **(q + 1) << std::endl;

delete a;
delete b;
http://codepad.org/6c07RgKf

 

Example 2

int (*p)[2];
This is a pointer to an array of two ints. Just as in the first example, we can think of this a pointer to a pointer, since an array is basically a const pointer.


int n[2];
n[0] = 4;
n[1] = 7;

int (*p)[2] = &n;

std::cout << (*p)[0] << std::endl;       // 4
std::cout << (*p)[1] << std::endl;       // 7

// or ...

std::cout << **p << std::endl;           // 4
std::cout << *(*p + 1) << std::endl;     // 7
http://codepad.org/UDcH5cIQ

 

Example 3

int (*p)();
This is a pointer to a function returning an int.
int takeThisInt()
{
    return 123;
}

// ...

int (*p)() = &takeThisInt;
int x = p();                      // note the parentheses
std::cout << x << std::endl;      // 123
http://codepad.org/iL701mg5

Monday, October 31, 2011

Quiz Time: X-Y Path Permutations


Q: In an x-y coordinate system (or Cartesian coordinate system), how many paths are there from the point of origin to a point with coordinates ⟨x, y⟩, where
  • x ∧ y ∈ Z + (x and y are positive integers),
  • a path is built up from exactly x + y segments, each with a length of one (1), and
  • a line segment must be either horizontal or vertical?

See below for answer.
 

Tuesday, October 25, 2011

Behind the Scenes: Name Hiding in C++

Name hiding happens when an overloaded member function is declared in the scope of a derived class. Here is an example:
class Base
{
public:
    virtual ~Base() {}

    virtual void someFunction() const;
    virtual void someFunction(int x) const;
};

void Base::someFunction() const {}
void Base::someFunction(int x) const {}

class Derived : public Base
{
public:
    void someFunction() const;
};

void Derived::someFunction() const {}

int main(int argc, char **argv)
{
    Derived d;
    d.someFunction(123);         

    return 0;
}
This, in fact, will not compile and gcc reports the following error:
error: no matching function for call to 'Derived::someFunction(int)'
While if we remove Derived::someFunction(), the code compiles without errors. First of all, note that there are two orthogonal language features involved here: function overloading and inheritance (overloading vs. overriding).


The scope of Derived becomes nested within the scope of its base class, however, introducing a name in the derived class obscures all overloaded versions of the same function in the Base class, hence the compiler error. This is actually standard behavior, for reasons which will be discussed below.

Thursday, October 6, 2011

Lambdas in C++0x, Part II

Let's recall the syntax of a lambda expression, from the first part of this post:
  [C](P) M -> R { body }(Q)
  C = capture specification
  P = parameter specification
  M = mutable specification
  R = return type specification
  Q = actual call parameters
Lambda expressions are locally defined functions, also known as anonymous functions. A minimal example could look something like this:
int main()
{
    auto f = []() { std::cout << "hello"; };
    f();
    return 0;
}
The lambda can also be invoked as it is defined, on-the-fly:
[]() { std::cout << "hello"; }();
Here is a lambda that takes an int as argument:
auto fn = [](int x) { std::cout << "x = " << x << "\n"; };
fn(123);
Side note: I tend to use the terms parameter and argument more or less interchangeably. To make things clear, we should, however, make a distinction between the two:
  • A parameter is a variable declared in the prototype or declaration of a function.
  • The argument is the value that is actually passed to the function.

A return value can be specified, using the trailing return type specification (→ R):
auto fn = [](int x) -> int { return x + 1; };
int n = fn(1);
std::cout << "->" << n << "\n";
The auto keyword saves us some work when the compiler is able to infer the type of an expression automatically. Alternatively, we could write:
std::tr1::function<int(int)> fn = [](int x) -> int { return x + 1; };
In this post, I will continue to examine the capture specification and mutable keyword.

Monday, September 26, 2011

Copy Semantics and the Rule of Three

Commonly referred to as "the three amigos", the
  • copy constructor,
  • copy assignment operator, and
  • destructor
are sometimes generated by the compiler, when not defined directly by the programmer. Given a class definition such as the one below, the compiler will provide its own, synthesized version of the copy constructor and assignment operator for us.
class Basic
{
public:
    Basic()
        : m_data(new char[1024])
    {
    }

    ~Basic()
    {
        if (m_data)
            delete[] m_data;
         m_data = 0;
     }

private:
     char *m_data;
};
Here is how these default, compiler adaptations deal with different member types when we copy, assign to, or delete an object:

Default copy constructor:
  • Primitive types: Value is copied
  • Pointer types: Pointer is copied (shallow copy)
  • Objects: Calls object's copy constructor

Default assignment operator:
  • Primitive types: Value is copied
  • Pointer types: Pointer is copied (shallow copy)
  • Objects: Calls object's assignment operator

Default destructor:
  • Primitive types: Does nothing
  • Pointer types: Does nothing
  • Objects: Calls object's destructor

With C++11 also comes two new amigos (more about these later), they are the:
  • move constructor, and
  • move assignment operator.

A shallow copy simply means that the value of the pointer is copied, i.e., the address of the data, not the data itself. In our class definition above, the compiler-generated copy constructor and assignment operator are implicit. Consequently, the two objects in the following statements will end up with m_data pointers pointing to the same data.
Basic obj1;
Basic obj2(obj1);

The problem with our compiler-generated copy constructor (or, the problem with the Basic class destructor, depending on how you look at it) manifests when the objects are destroyed.
int main(int argc, char **argv)
{
    Basic obj1;
    Basic obj2(obj1);
    return 0;
}
This program will crash, and here is why:

  • The first object falls out of scope, and the data pointed to by m_data is released (in the destructor). So far, so good.
  • The second object falls out of scope. Since this object's m_data pointer is still pointing to the address of the memory we just released, the program will attempt to release the same memory again, which will cause the program to crash.

Rule of three

Dr. Dobbs describes the problem in great detail, in an article by Andrew Koenig and Barbara E. Moo. The observations are summarized in the following two rules, collectively referred to as "the rule of three":
  • If a class has a nonempty destructor, it almost always needs a copy constructor and an assignment operator.
  • If a class has a nontrivial copy constructor or assignment operator, it usually needs both of these members and a destructor as well.

Simply put, the rule states that; if we define one of the three members, we should usually define all three of them. To do so, there are essentially four different approaches. Each will be described in more detail below.
  • No copying
  • Deep copy
  • Reference counting
  • Copy-on-write

    Thursday, September 15, 2011

    The Evil Side of Returning a Member as const Reference

    Returning a class member (e.g., from a getter function) has negative impact on performance when large objects and copying is involved. In some code I have come across, instead of simply returning a copy, a const reference is used as the return value. Here is an example:
    class SomeClass
    {
    public:
        // ...
        const std::string &someString() const { return m_someStr; }
    
    private:
        std::string m_someStr; 
    }
    Before elaborating any further on this, let's recall the different ways of passing arguments to functions.

    Wednesday, August 31, 2011

    Behind the Scenes: Polymorphism and Function Resolution in C++

    This post describes how function resolution is typically (but not always) implemented by compilers.
    For non-virtual functions, the compiler matches a function call with the corresponding function definition at compile-time. This is known as static binding.

















    Dynamic Polymorphism

    When a member function is declared virtual, it can be overridden in a class that directly, or indirectly, derives from the class where it was declared (the base class).
    class Base
    {
    public:
        virtual ~Base() {}
        virtual void hello() = 0;     // hello is pure virtual
    };
    
    class A : public Base
    {
    public:
        void hello() { std::cout << "A says 'hello'\n"; }
    };
    
    class B : public Base
    {
    public:
        void hello() { std::cout << "B says 'hello'\n"; }
    };
    
    void sayHello(Base& helloer)
    {
        helloer.hello();
    }
    
    int main()
    {
        Base *a = new A;
        Base *b = new B;
        sayHello(*a);
        sayHello(*b);
        delete a;
        delete b;
    }
    This program generates the following output:
    A says 'hello'
    B says 'hello'
    Even though helloer is a reference to the base class, which function that will actually be called depends on the type of object we pass to sayHello(). This means, of course, that the function call to hello() must be resolved at run-time. This is referred to as dynamic binding. Internally this is achieved using two constructs known as v-tables (virtual tables, or dispatch tables) and v-pointers.

















    • A v-table is created for every class that: declares one or more virtual functions, or overrides a virtual function. (A)
      1. Such class is called a polymorphic class.
      2. Functions that override virtual functions from a parent class are adjusted in the v-table of the derived class. (B)
      3. The function call is resolved at run-time, from the matching index in the v-table, pointed to by the actual object's v-pointer. (C)
      4. Each object of a polymorphic class has its own v-pointer, but there is only one v-table per class.
      5. The v-tables and v-pointers are added by the compiler, and not visible to the programmer.
      class Fruit
      {
      public:
          Fruit()
              : m_color("fruit-colored"),
                m_peeled(false)
          {}
      
          virtual ~Fruit() {}
          virtual void eat();    
          virtual void peel();
          void setColor(const std::string &color) { m_color = color; }
          const std::string &color() const { return m_color; }
      
      protected:
          std::string m_color;
          bool m_peeled;
      };
      
      void Fruit::eat()
      {
          std::cout << "Now eating this " << m_color << " fruit!\n";
      }
      
      void Fruit::peel()
      {
          if (!m_peeled)
               m_peeled = true;
      }
      
      class Orange : public Fruit
      {
      public:
           Orange()
               : Fruit()
           {
               m_color = "orange";
           }
           void eat();
      };
      
      void Orange::eat()
      {
          if (m_peeled) {
              Fruit::eat();
          } else {
              std::cout << "Can't eat orange before peeling it!\n";
          }
      }
      int main()
      {
          Fruit *fruit1 = new Fruit;
          Fruit *fruit2 = new Orange;
          Fruit *fruit3 = new Orange;
      
          fruit1->eat();       // Now eating this fruit-colored fruit!
          fruit2->eat();       // Can't eat orange before peeling it! 
      
          fruit2->peel();
          fruit2->eat();       // Now eating this orange fruit! 
      
          std::cout << fruit2->color();        // orange
      
          delete fruit1;
          delete fruit2;
          delete fruit3;
      
          return 0;
      }
      Run this example from codepad: http://codepad.org/QUyLRq6l

      Friday, August 26, 2011

      Lambda Calculus Cheat Sheet





















      Click here to download in PDF format

      Tuesday, August 16, 2011

      The Meaning and Usefulness of Closures

      In the previous post, I talked about the Fixed Point Combinator in λ-Calculus, and mentioned that combinator is another word for a closed lambda term, i.e., an expression with no free variables:

      λx.x, λxy.x, λxy.y, etc.

      Take for example the term λxy.y, which is really shorthand for λx.λy.y:

      FV(λx.λy.y)   =   FV(λy.y) − {x}   =   FV(y) − {y} − {x}   =   {y} − {y} − {x}   =   Ø.

      The opposite of a combinator – any expression that uses free variables, is called a closure. Even though the meaning of the term is slightly different in different contexts, the idea is basically the same:

      Examining the term λx.(f x), we have the free variables FV(λx.f x) …

      = FV(f x) − {x} = FV(f) ∪ FV(x) − {x} = {f} ∪ {x} − {x} = {f, x} − {x}
      = {f}.

      In Haskell, the definition for a function, func:
      func f = \x -> f x
      … uses the variable f, bound outside of the abstraction. It is therefore a closure.

      Saturday, August 6, 2011

      The Pattern of Recursion and Fixed Point Combinators

      Recursion is an interesting, and important computation tool that enables us to formulate patterns of self-similarity and express infinity, using a finite set of elements. There is a strong relation between recursion and the principle of induction in mathematics. Imperative programming languages usually allow recursion in a program to take place by letting us define a function that calls itself. Here is a simple recursive function in C++:
      #include <iostream.h>
      
      void count(int n)
      {
          std::cout << "Counting: " << n << "\n";
          if (n < 5) {
              count(n+1);
          }
      }
      
      int main(int argc, char **argv)
      {
          count(1);
          return 0;
      }
      Run this example from codepad:
      http://codepad.org/YmxQcTIX

      In functional programming, recursion has a much more central position. Purely functional languages do not define any looping constructs, but instead rely entirely on recursion to achieve iterative algorithms. Many data types are defined recursively. In Haskell, recursion even allows us to define (conceptually) infinite data structures:
      n = 5 : n
      take 3 n                -- [5, 5, 5]
      Codepad: http://codepad.org/sRbFkSH5

      Here, we take the first three elements from an infinite list of fives. The list n is defined as a single element, 5, followed by a list – itself.
      n is a 5, followed by n, which is a 5 followed by n, which is a 5 followed …

      Saturday, July 30, 2011

      A Reminder About Implicit Conversion and Default Value Arguments

      Implicit conversion is usually undesirable, and as a general rule, single-parameter constructors should be explicit unless there is a specific reason otherwise. Using the explicit keyword, we declare a constructor to be non-converting:
      class Stuff
      {
      public:
          explicit Stuff(int n)
          {
          }
      
          // ...
      }
      
      Stuff stuff = 5;            
      // error: conversion from ‘int’ to non-scalar type ‘Stuff’ requested
      
      Stuff stuff(5);             // ok
      But what if the constructor takes more than one argument, and default values are specified for the additional, or for all arguments? We can still call this constructor using one parameter only. Consider the following example:
      class Stuff
      {
      public:
          Stuff(int x, int y = 0, int z = 1)
          {
          }
      
          // ....
      }
      
      Stuff stuff = 5;            // aargh!
      The explicit keyword should be used here as well. So, the rule actually applies to:
      • single argument constructors, and
      • multiple argument constructors, if all OR all except one of the arguments have a default value.

      Wednesday, July 27, 2011

      Lambdas in C++0x, Part I

      The new C++0x standard adds support for anonymous functions; so called lambda expressions. The concept of anonymous, or locally defined functions plays a central role in functional programming and has its origin in the Lambda Calculus (λ-Calculus); a theory formulated by Alonzo Church in 1936 (yes, that was before FORTRAN). The basic idea here is to treat functions as first class objects – functions can be stored in variables, passed as arguments, and be used as return values etc.
      Lambda expressions are similar to functors (function objects), but with the added advantage of now being an integral part of the language. We can think of them as syntactic sugar for a functor. Just like function objects, lambdas are type-safe and can maintain state.

      Using lambda expressions

      The lambda term λx.3 represents the anonymous function x → 3, i.e., a function which takes a single input, x and (always) returns 3.

      Using the new C++0x lambda syntax, this can be expressed as:
      [](int x) { return 3; }
      …where the initial square brackets ( [ ] ) are called the lambda introducer and tells the compiler that a lambda expression follows. Since x is irrelevant here, we may simply omit the parameter specification in this case. This leaves us with a very basic function that returns the value 3:
      [] { return 3; }
      
      
      

      Monday, July 25, 2011

      Type Inference in C++: The Return of Auto

      With the new C++ standard (C++0x), the auto keyword has returned to the language, like a phoenix rising from the ashes – with new and completely different meaning.

      With this new auto, we can ask the compiler to deduce the type of a declared variable from the expression used to initialize it. This feature is known as type inference, and is found in many functional programming languages. Type inference (or type deduction) introduces some of the advantages of dynamic type systems, without incurring the performance penalty and various other negative side-effects associated with relaxed type safety, such as increased risk of run time failure.

      A very basic, and not very useful example could look something like:
      int x = 1;
      auto y = x + 1;
      The important thing to note here is that auto is a keyword and not a type. This is to say that, unlike in a dynamically typed language, we can’t assign a value of a different type once the variable has been declared. The type of y (int) must be known at compile time, we just leave to the compiler to figure out what it is.

      Saturday, July 23, 2011

      Storage Class Specifiers and Name Scope in C++

      The following is part of a contribution I made to one of the pages on the
      Qt Centre Community Wiki, titled "Going Out of Scope":


      In C++, as in many other languages, variables have scope; i.e., a variable or function name can be used only within certain parts of a program. This "area of visibility" is referred to as the name's scope. Additionally, C and C++ also employ the concepts storage duration and linkage. Storage duration determines how and when during program execution an identifier exists in memory – the lifetime of its storage. Linkage affects a name's visibility across translation units (the contents of a single source file, plus the contents of any header files included by it).

      The two concepts that are most relevant here are scope and storage duration.