MeeGo 1.2 Harmattan Developer Documentation Develop for the Nokia N9

Coding for performance

This section includes best practices for optimising your code. For coding hints and tips with Qt containers, see Qt documentation online.

Selecting the correct compiler optimisation options

You can significantly increase the execution speed of the compiled binary by selecting the correct optimisation options and compiler. In some cases the difference can be several hundred percents with the same target device.

By default, -O2 is a good optimisation level. If you use -O3 GCC optimisation level instead of -O2, it can increase the amount of code, especially C++, and make it slower. It is recommended that you measure the effect of compiler options.

For a summary of the available GCC command options, see Invoking GCC.

Avoiding linked lists in the data structure

Linked lists and other data structures that do not have an ordered and coherent structure in the memory decrease CPU cache utilisation. The reason for this is that successive data items are not located in a compact memory area that can be loaded into cache at once. Thus, you need access to the main memory each time a new data item is dereferenced.

Instead of linked lists, tables (vectors) provide a much more coherent structure and thus enable you to better utilise the CPU cache.

However, linked lists have the following benefit over tables: adding and removing items in the middle of linked lists can be done in constant time, regardless of the number of items in the list. To do the same with tables, you need a faster growing algorithm because adding and removing items in the middle of tables becomes slower as the size of the table increases. However, in all other use cases linked lists are significantly slower than tables.

Note: If there are a lot of data items and the data structure is dynamic in nature, consider more sophisticated data structures, such as binary trees, instead of tables. Even though the sophisticated data structures commonly utilise pointers in the same way as linked lists, the algorithms used with them are much more powerful, which justifies the consumed cache.

Avoiding floating points

Do not use floating points instead of integers unless you know that there really is a need to use them. Since floating point arithmetic is much slower than calculating with integers, especially avoid using floating point variables. The speed difference between floats and integers is high especially on current ARM Cortex-A8 based devices. Also on ARM double precision is even slower than single precision.

If you need floats, do not combine integers and floats unless they are independent computations from each others.

If calculating with integers is not enough:

  • Do not use fixed point arithmetic, or simulate floating points with integers, simply use a floating point instead.
  • Try to avoid using double precision (64 bit) floating points (double type in C/C++), because they produce a significant performance hit (slower and usually more code generated). Use the f suffix in our floating point constants in C/C++ code. In Qt, use constructions such as qreal(0.5) to express single precision constant on ARM platform and double precision on others.

Avoiding static C++ objects

Static C++ objects are constructed when a library is linked, at application startup or when it is dlopened. Thus, especially avoid static C++ objects in libraries. Since you cannot control their allocation lifetimes, it is difficult to troubleshoot problems.

If you need to have static C++ objects, at least make sure that their constructions do not create allocations or sockets.

Using initialisation instead of assignment

In C++, always initialise objects when you define them. If you first define and only later assign values, creating the object invokes both the default constructor and then the assignment operator. If you initialise objects instead, it invokes only the copy constructor. Why make two separate steps when one step can do the same?

The following example illustrates how the MyClass default constructor is first called when myObject is created. After that, all values of myObject are overwritten by assigning anotherObject to it:

MyClass myObject;
myObject = anotherObject;

To avoid this unnecessary step, you can instantiate the object with copy constructor in either of the following ways:

  • MyClass myObject = anotherObject;
  • MyClass myObject( anotherObject );

In both cases, myObject is instantiated with proper values when it is created, as copy constructor is used instead of default constructor. This increases the speed but also the complexity of the object's class.

Note: Default copy constructors (implicit copy constructors) do only shallow copies. If class encapsulates pointers, complex data structures or holds exclusive access to some resources, deep copies are needed. Declare the copy constructor, destructor and also operator= explicitly. See C++ rule of three for more information. Another way around it is to use smart pointers. However, for some solutions this implementation approach may be too heavy.

Using references to pass large parameters

Memory performance of OMAP3-based devices is fairly limited compared to modern workstation hardware. Since memory copying has a very high impact on performance, avoid it when possible.

The following example shows an instance of the MyClass class passed as value to the DoSomething function:

int DoSomething( MyClass myObject )
{
    // Use myObject for something.
    int myReturnValue = myObject.GenerateValue();

    return myReturnValue;
}

When this function is called, a new instance of the MyClass class is created through its copy constructor, which is implicitly used upon the function call. This means that all data of the passed object is copied to a newly created object of the same class, which is used only inside the scope of function. When the function returns, the destructor of MyClass is also implicitly called.

You can implement the same functionality without implicit constructor and destructor calls, and thus data copying, in a very simple way. Use C++ reference instead: just add '&' to the front of object name in function declaration. The following example shows the same function if you pass the reference instead of the value:

int DoSomething( const MyClass& myObject )
{
    // Use myObject for something.
    int myReturnValue = myObject.GenerateValue(); 

    return myReturnValue;
}

When the DoSomething function is called after this modification, the (myObject) object is passed as a reference, which means that no copy constructor or destructor is called, and no unnecessary memory copying occurs. In fact, myObject is the same object that was passed. The larger the passed object, the more significant the gained performance improvement, but even with small objects, there is always some benefit when you pass references instead of values.

Using references to avoid complex return values

If you use a reference, you can also make changes to the passed parameters inside the function. This means that there is no need to create complex data types inside functions to be used as function return values. Instead, you can give the values as reference parameters.

The following example illustrates how the DoSomething function creates a new object (myReturnValue) of the MyClass class when it is called. The new object is used as a return value for the function. After the assignment in the main function, the myReturnValue object is destroyed. This means that there is one additional constructor, destructor, and assignment operator.

MyClass DoSomething( int i )
{
    MyClass myReturnValue( i );

    return myReturnValue;
}

int main( int argc, char** argv )
{
    MyClass myObject = DoSomething( 42 );

    return 0;
}

In the following example, the same result is achieved by using a reference instead of a function:

void DoSomething( int i, MyClass& objectReference )
{
    objectReference.SetValue( i );
}

int main( int argc, char** argv )
{
    MyClass myObject; 

    DoSomething( 42, myObject );

    return 0;
}

Now, the DoSomething function changes the value of the passed reference object, (objectReference). This saves additional constructor and destructor calls. Instead of class-specific assign operator, it uses the SetValue method of the MyClass class.

Avoiding deep copies in Qt

Qt provides implicit sharing for all of its containers and other classes, such as QByteArray, QBrush, QFont, QImage, QPixmap, and QString. This makes these classes very efficient to pass by value, both as function parameters and as return values. Implicit sharing guarantees that the data is not copied if you do not modify it. In classes that do not provide implicit sharing, pass references instead.

When you pass a container as a value, Qt's implicit sharing optimisation ensures that no unnecessary memory copying occurs. This means that copying a Qt container is almost as fast as copying a single pointer. Only if one of the created copies is changed, the data is actually copied. All this is handled automatically within Qt containers. For this reason, implicit sharing is sometimes called 'copy on write'.

Implicit sharing encourages you to adopt a clean programming style where objects are returned by value, see the following example.

QVector<float> SineTable()
{
    QVector<float> vect( 360 );
    for( int i = 0; i < 360; i++ )
        vect[ i ] = std::sin( i / (2 * M_PI) );
    return vect;
}

In this case, the call to the function is:

QVector<float> table = SineTable();

This is as fast as passing a reference, due to Qt's implicit sharing. Without implicit sharing, pass reference to the function to avoid implicit call to copy constructor. The following example illustrates the same functionality with reference:

void SineTable( std::vector<float>& vect )
{
    vect.resize( 360 );
    for( int i = 0; i < 360; i++ )
        vect[ i ] = std::sin( i / (2 * M_PI) );
    return vect;
}

In this case, the call to the function is:

std::vector<float> table;
SineTable( table );

To get the best out of implicit sharing, use the at() function rather than the [] operator for read-only access of a (non-const) vector or list. Since Qt's containers cannot tell whether [] appears on the left side of an assignement or not, it assumes the worst and forces a deep copy to occur. The at() fucntion is not allowed on the left side of an assignment and thus it does not create a deep copy.

A similar issue occurs when you iterate over a container with Qt's STL-style iterators. Whenever begin() or end() is called on a non-const container, Qt forces a deep copy to occur if the data is shared. To prevent this, use const_iterator, constBegin(), and constEnd() whenever possible.

Using references instead of pointers

Consider using references instead of pointers. The following example illustrates both functions:

int x;
void PointerFunction( const int* p ) { x += *p; }
void ReferenceFunction( const int& p ) { x += p; }

Both functions generate exactly the same machine language, but ReferenceFunction has some advantages:

  • There is no need to check that the reference is not NULL. References are never NULL.
  • References do not require a dereference operator (*). Less typing leads to cleaner code.
  • There is an 'opportunity' for greater efficiency with references:
    • A major challenge for compiler writers is producing high performance code with pointers. Pointers make it extremely difficult for a compiler to know when different variables refer to the same location, which prevents the compiler from generating the fastest possible code.
    • Since reference points always to the same location during its entire life, compiler can do a better job than with pointer-based code.

Avoiding constructors and destructors in loop

In the following example, in each iteration of the loop, a new instance of MyClass is allocated from stack, its constructor and destructor is called, and finally, the stack is released.

for( int i = 0; i < bigNum; i++ )
{
    MyClass myObject;
    GetDataForMyObject( i, myObject );
    DoSomething( myObject );
}

The following example illustrates how an instance is created outside the scope of a loop:

MyClass myObject;
for( int i = 0; i < bigNum; i++ )
{
    GetDataForMyObject( i, myObject );
    DoSomething( myObject );
}

This is far more efficient, since the constructor is only called once for the loop (and destructor is called when the execution moves out of the current scope).

Avoiding pointer dereferencing in loops

Avoid pointer dereferencing because it is quite an expensive operation. The following example illustrates how multiple unnecessary pointer dereferences are evaluated inside a loop.

for( int i = 0; i < BigNum; i++ )
{
    database->Data->OldData->Values[ i ] = someValue;
}

The following examples illustrate how you can implement the same functionality more efficiently:

ClassOfOldData* oldData= database->Data->OldData;
for( int i = 0; i < BigNum; i++ )
{
    oldData->Values[ i ] = someValue;
} 
TypeOfValue* values = database->Data->OldData->Values;
for( int i = 0; i < BigNum; i++ )
{
    *values++ = someValue;
}

Note: Using compiler optimisation options can often do the same and move the dereferencing outside the loop. However, it is not available for all optimisation levels and it may not help in all possible situations.

Avoiding division inside a loop

Integer division always generates a function call in ARMv7A architecture / Cortex-A8 CPUs (used in Harmattan devices). That is, the CPUs used in the Harmattan devices lack native support of integer division.

In practice, when integer division is used in code, gcc has to create a call to an external function, __aeabi_idiv, which implements the division. This function is part of libgcc library.

In time-critical loops, avoid the code illustrated by the following example:

for( int i = 0; i < BigNum; i++ )
{
    array[ i ] = array[ i ] / someValue;
}

Instead, use the code illustrated by the following example:

int divisionFactor = (1 << shiftFactor) / someValue; 

for ( int i = 0; i < BigNum; i++ )
{
    array[ i ] = (array[ i ] * divisionFactor) >> shiftFactor;
}

Now, division inside a loop has been replaced by multiplication and bitwise shift, which are faster operations than division. However, this works only if the result of (1 << shiftFactor) is large enough in comparison to someValue. Also, the result may be slightly different compared to division operation due to integer rounding, depending from the used divisor.

In general, if you want to divide by 2, 4, 8, ..., 2n, consider bitwise right shift instead of division. For example, a/8 is equal to a >> 3 .

Note: Left shifting (<<) may cause an overflow if very large numbers or/and shifts are used.

Even if the following example may appear to be a good alternative for traversing a two-dimensional table, avoid this due to division and modulus inside loop:

for( i = 0; i < M * N; i++ )
{
    x = i / N;
    y = i % N;
    do_something_with( a[x][y] );
}

The following example illustrates how you can achieve the same functionality more efficiently:

for( x = 0; x < M; x++ )
{
    for( y = 0; y < N; y++ )
    {
        do_something_with( a[x][y] );
    }
}

Avoiding repeating the same expression

The following example shows how a heavy method call HeavyMethod is used in a conditional statement:

if( ( myObject->HeavyMethod() ) < 10 )
{
    // Do something.
} 
else if( ( myObject->HeavyMethod() ) > 30 )
{
    // Do something else.
}

In the above example, HeavyMethod is evaluated in both branches, which means that it is always called twice even though it produces same result. The following example illustrates how you can achieve the same functionality with a single execution of HeavyMethod by moving the calculation before if statement.

int resultOfHeavyMethod = myObject->HeavyMethod(); 

if( resultOfHeavyMethod < 10 )
{
    // Do something.
}
else if( resultOfHeavyMethod > 30 )
{
    // Do something else.
}

Now HeavyMethod is called only once, and its return value (resultOfHeavyMethod) can be used in conditions instead.

Using stdio instead of iostream (printf over cout)

C++ stream IO is a very flexible and safe method of creating input and output. It allows you to define specific output formats for your own objects. The compiler notifies you if an object does not support output. On the other hand, printf is not very safe. If you specify wrong numbers for parameters, or give them in the wrong order, you crash. However, GCC warns you about these mistakes if you have enabled warnings. You cannot define new output formats either. However, printf does have a few benefits: it is fast, easy to use, and often easier to read than long lines of << operators.

The following examples display the same results, but printf does it more efficiently and the code is more readably. Use the <cstdio> family of functions instead of the <iostream> family when speed is critical.

// C++ stream io
cout << 'a' << ' ' << 1234 << ' ' << setiosflags( ios::showpoint )
<< setiosflags( ios::fixed ) << 1234.5678 << ' ' << 
setiosflags( ios::hex ) << &i << ' ' << "abcd" << '\n';
// stdio
printf( "%c %d %f %p %s\n", 'a', 1234, 1234.5678, &i, "abcd" );

Using inline functions cautiously

Inlining functions in C++ is a very simple way to optimise applications. When an inline keyword is used, the function is not called, but its contents are copied to the places where it is used. The performance can increase considerably, since you avoid the function call, stack frame manipulation, and the function return.

However, inline functions commonly increase the size of a compiled application (binary) and in some cases they can also increase the execution time. As the size of the binary increases, the "inner loop" of the application may no longer fit in the CPU's cache, which causes unnecessary cache misses and accesses to memory that could be avoided without inlining.

To increase performance:

  • Avoid inlining functions until profiling indicates which functions could benefit from inline.
  • Use static where applicable. Compiler can optimise local functions much better than global ones.
  • Only inline functions where the function call overhead is large in relation to the function's own code.
  • Do not inline large functions or functions that call other (possibly inlined) functions.

Note: GCC -O3 optimisation level automatically performs more inlining.

For information on the GCC inline functions, see An Inline Function is As Fast As a Macro.

Checking the parameter buffer usage of your application

Games that use OpenGL APIs directly or applications that contain a high amount of geometry may require a higher parameter buffer than 1 MiB which is available by default. For these applications, do either of the following:

  • Fix the issue to ensure that the application meets the performance and resource usage requirements. This is the recommended solution in most cases.
  • Increase the size of the parameter buffer.
  • Use initialisation lists rather than assignment in constructors:
Foo::Foo() : x_(expressions) { }

instead of:

Foo::Foo(){ x_ = expressions; }

The former way is preferable as it does not create a temporary object to accomplish the same thing.