## Wednesday, November 9, 2011

### Reinventing the Wheel, Part I: String Reverse Algorithm

A string reverse algorithm takes a string of characters, and produces a string with the characters in reverse order.

In doing this, we could
• use a more high-level C++ approach,
```std::string reverse(const std::string &str);

// ...

std::string reversed = reverse("uncategorized");
printf(reversed);                                 // dezirogetacnu```
where the function returns a reversed copy of the original string. (Btw, to reverse a std::string, use the std::reverse algorithm, see bottom of this post.)

Alternatively, we can
• write a C-styled function, using a "raw" pointer and then modify the string, in-place:
```void reverseInPlace(char *str);

// ...

char s[] = "november";
reverseInPlace(s);
printf(s);                                // rebmevon```
Since we are going for minimal memory footprint and optimal performance, we will pursue the latter approach.

Here, we do not consider utf-8 encoding, or some other important aspects of string handling that should be taken into account in a real-life application scenario.

# Using a string buffer

The perhaps most trivial solution involves using a buffer. Depending on the programming language, this could also be the only option. It is, however, not ideal since it requires additional memory to accommodate for the temporary string. So technically, it is not really in-place.

Here is a simple C implementation. We write the string in reverse order to a separate, stack-allocated char array, and then copy the data back to the memory location of the original string using memcpy. Again, this is not the most efficient solution, but it works.
```#include <stdio.h>
#include <memory.h>

void reverse(char *str)
{
int n = 0;
while (str && str[n++]);
if (--n < 2) return;
char r[n--];
int x = 0;
while (x <= n)
r[x] = str[n - x++];
memcpy(str, r, x);
}

int main(int argc, char *argv[])
{
if (argc > 1) {
char *str = argv;
reverse(str);
printf("%s\n", str);
}
return 0;
}```
We save this as main.c, compile using,
`gcc main.c -o reverse -std=c99`
and run:
`./reverse '!hserf'`

# Haskell

Before we examine how this algorithm can be improved, let's take a look at a similar function in Haskell:
```r [] = []
r (x:xs) = r xs ++ [x]```
What!? Where is the program?

Let's see what happens. Keep in mind that declarative programming is really more about what things are (i.e., pure functions), as opposed to what they do (control flow). Also note that a string in Haskell is a list of characters.
```Prelude> let str = ['h', 'e', 'l', 'l', 'o']
Prelude> str
"hello"```
Back to the example, this is a recursive function definition — a common pattern in many functional programming languages. The first line describes the edge condition ([] is an empty list), and x:xs binds the head of the list to x, and the remaining list to xs.

Evaluation of the expression can be broken down into the following steps (in pseudo-code):
```r "hello"      = (r "ello") ++ ['h']
= (r "llo") ++ ['e'] ++ ['h']
= (r "lo") ++ ['l'] ++ ['e'] ++ ['h']
= (r "o") ++ ['l'] ++ ['l'] ++ ['e'] ++ ['h']
= ['o'] ++ ['l'] ++ ['l'] ++ ['e'] ++ ['h']```
Just for fun, let's compress this program even more (jQuery style), and we end up with this one-liner:
`r[]=[];r(x:y)=r y++[x]`
We could also have defined r as:
`r=foldl(flip(:))[]`
Ok, to be fair to C, this program doesn't have a main entry point, and it doesn't write anything to the output stream. Here is a complete program that also reads the input string from the command line arguments:
```module Main where
import System.Environment

r [] = []
r (x:xs) = r xs ++ [x]

main = do args <- getArgs; print (r (head args))```
Save this file as reverse.hs, compile with
`ghc reverse.hs -o reverse`
and then (on Linux):
`./reverse '!!cigam'`

# In-place swapping

Back to the C program. Using pointer arithmetic, we can get rid of the redundant string buffer and instead, truly, modify the string in-place. This illustration explains the idea:

Here is a C program:
```#include <stdio.h>

void reverse(char *str)
{
unsigned int n = 0;
while (str && str[n++]);
if (--n < 2) return;
int x = 0;
unsigned int nn = n >> 1;
while (n-- > nn)
{
char c = str[x];
str[x++] = str[n];
str[n] = c;
}
}

int main(int argc, char *argv[])
{
if (argc > 1) {
char *str = argv;
reverse(str);
printf("%s\n", str);
}
return 0;
}```
We could also have used std::swap here, with the same result.
```    while (n > nn)
std::swap(str[x++], str[--n]);```

# Ye olde XOR swap trick

There is an old assembly programming trick to swap the contents of two memory addresses using three consecutive XOR instructions. Remembering the old mantra, "premature optimization is the root of all evil", these kinds of "hacks" rarely give much (if any) performance benefit and are usually best avoided. Nevertheless…  let's try it. The diagram shows how the trick works:

Here is the C program:
```#include <stdio.h>

void reverse(char *str)
{
unsigned int n = 0;
while (str && str[n++]);
if (--n < 2) return;
int x = 0;
unsigned int nn = (n - 1) >> 1;
while (--n > nn) {
str[n] ^= str[x];
str[x] ^= str[n];
str[n] ^= str[x++];
}
}

int main(int argc, char *argv[])
{
if (argc > 1) {
char *str = argv;
reverse(str);
printf("%s\n", str);
}
return 0;
}```
Save as main3.c, and as usual, compile with
`gcc main3.c -o revhack -std=c99`
and run:
`./revhack '!skcor rox'`

# Inline assembly

To see if we can get even better performance, here is an assembly routine written in x86-64 assembly using the inline GAS (AT&T) syntax supported by gcc. (I didn't use the XOR trick here.) This program will only work on compatible hardware.
```#include <stdio.h>

void reverse(char *str)
{
__asm__(
"movq %%rax, %%rdi"             ";"
"movq %%rdi, %%rsi"             ";"
"count:"
"incq %%rdi"                    ";"
"cmpb \$0, (%%rdi)"              ";"
"jne count"                     ";"
"movq %%rdi, %%rcx"             ";"
"subq %%rsi, %%rcx"             ";"
"cmpq \$2, %%rcx"                ";"
"jl exit"                       ";"
"leaq (%%rsi, %%rcx), %%rdi"    ";"
"decq %%rdi"                    ";"
"loop:"
"movb (%%rsi), %%al"            ";"
"movb (%%rdi), %%bl"            ";"
"movb %%al, (%%rdi)"            ";"
"movb %%bl, (%%rsi)"            ";"
"incq %%rsi"                    ";"
"decq %%rdi"                    ";"
"cmpq %%rsi, %%rdi"             ";"
"jg loop"                       ";"
"exit:"
:
: "a" (str)
: "%rsi", "%rdi", "%rcx", "%rbx"
);
}

int main(int argc, char *argv[])
{
if (argc > 1) {
char *str = argv;
reverse(str);
printf("%s\n", str);
}
return 0;
}```
Compile and run the same way as the other examples.

# Comparison

Here are the results when I benchmarked the various algorithms with some different size strings. (The Haskell version is not included in the test.)

Some observations:
• The XOR hack turned out to be something of a fiasco. This is not surprising, clever optimization tricks usually turn out to be not so clever. For more reasonably sized strings, it even run longer than the first algorithm.
• The first (buffer/copy) algorithm is asymptotically slower (O(N log2 N)) than the others, who all have linear time complexity. It also, as noted earlier, uses more memory.
• The assembly routine is just slightly faster than the C version. It is amazing how optimized the code generated by gcc is, and it shows that we should really just trust the compiler most of the time.

# Conclusion

My choice is (2), the simple swapping C algorithm, for a good balance of
• readability,
• portability,
• performance, and
• memory efficiency.

The functional programming approach also looks very attractive. Haskell combines elegance, performance and portability and offers all the advantages of referential transparency. Haskell also has great support for concurrency.

Assembly language is a candidate when targeting specific hardware, if performance and memory use are of critical importance.

# std::string and std::reverse

Finally, to reverse a std::string, we only need to:
```std::string str("eromsissel");
std::reverse(str.begin(), str.end());

std::cout << str;```
The reverse function is part of STL algorithm. This is the solution I would use in real-life C++ code.