Monday, September 26, 2011

Copy Semantics and the Rule of Three

Commonly referred to as "the three amigos", the
  • copy constructor,
  • copy assignment operator, and
  • destructor
are sometimes generated by the compiler, when not defined directly by the programmer. Given a class definition such as the one below, the compiler will provide its own, synthesized version of the copy constructor and assignment operator for us.
class Basic
{
public:
    Basic()
        : m_data(new char[1024])
    {
    }

    ~Basic()
    {
        if (m_data)
            delete[] m_data;
         m_data = 0;
     }

private:
     char *m_data;
};
Here is how these default, compiler adaptations deal with different member types when we copy, assign to, or delete an object:

Default copy constructor:
  • Primitive types: Value is copied
  • Pointer types: Pointer is copied (shallow copy)
  • Objects: Calls object's copy constructor

Default assignment operator:
  • Primitive types: Value is copied
  • Pointer types: Pointer is copied (shallow copy)
  • Objects: Calls object's assignment operator

Default destructor:
  • Primitive types: Does nothing
  • Pointer types: Does nothing
  • Objects: Calls object's destructor

With C++11 also comes two new amigos (more about these later), they are the:
  • move constructor, and
  • move assignment operator.

A shallow copy simply means that the value of the pointer is copied, i.e., the address of the data, not the data itself. In our class definition above, the compiler-generated copy constructor and assignment operator are implicit. Consequently, the two objects in the following statements will end up with m_data pointers pointing to the same data.
Basic obj1;
Basic obj2(obj1);

The problem with our compiler-generated copy constructor (or, the problem with the Basic class destructor, depending on how you look at it) manifests when the objects are destroyed.
int main(int argc, char **argv)
{
    Basic obj1;
    Basic obj2(obj1);
    return 0;
}
This program will crash, and here is why:

  • The first object falls out of scope, and the data pointed to by m_data is released (in the destructor). So far, so good.
  • The second object falls out of scope. Since this object's m_data pointer is still pointing to the address of the memory we just released, the program will attempt to release the same memory again, which will cause the program to crash.

Rule of three

Dr. Dobbs describes the problem in great detail, in an article by Andrew Koenig and Barbara E. Moo. The observations are summarized in the following two rules, collectively referred to as "the rule of three":
  • If a class has a nonempty destructor, it almost always needs a copy constructor and an assignment operator.
  • If a class has a nontrivial copy constructor or assignment operator, it usually needs both of these members and a destructor as well.

Simply put, the rule states that; if we define one of the three members, we should usually define all three of them. To do so, there are essentially four different approaches. Each will be described in more detail below.
  • No copying
  • Deep copy
  • Reference counting
  • Copy-on-write

    Thursday, September 15, 2011

    The Evil Side of Returning a Member as const Reference

    Returning a class member (e.g., from a getter function) has negative impact on performance when large objects and copying is involved. In some code I have come across, instead of simply returning a copy, a const reference is used as the return value. Here is an example:
    class SomeClass
    {
    public:
        // ...
        const std::string &someString() const { return m_someStr; }
    
    private:
        std::string m_someStr; 
    }
    Before elaborating any further on this, let's recall the different ways of passing arguments to functions.