J.H.Work

19Sep/120

About OpenGL texture formats and performance

I find it really hard to find useful information about textures and their respective performance. Because of this I wrote a little tool to measure the performance of various texture formats and texture environment settings.

I mostly sticked to the formats that are supported and not deprecated in OpenGL 3.0 and above. You can find a list here and an article with more information on the various formats here.

Since MipMapping has a high impact on the performance and results in highly view-dependent differences I disabled it for most of the tests.
My intial test of rendering a 4096x4096 texture in various formats with magnification set to GL_LINEAR and wrapping set to GL_REPEAT I got the following results on my GTX 560:

Results SelectShow

From the gathered data we learn a few things:

  • The more bits per pixel used, the slower it is
  • It doesn't matter if the type is integer or float
  • There is no performance difference between 24bit (RGB) and 32bit (RGBA) formats
  • Compressed textures are much faster even though they should add a lot of decoding overhead

While it is good to know these rules, it is actually better to know the 'why'. I don't have 100% accurate explanations on everything but here are my assumptions:

Texture cache and bandwidth is very important, if the data is smaller the pixels will be loaded faster into the cache. Modern GPUs have the decoding logic of the texture formats hardwired, including compressed formats, which means that there is practically no additional decoding cost when using one format over the other. Therefore using GL_COMPRESSED_RGB is theoretically as fast as GL_RGB and the speed difference comes only from the size of the pixel data. That also explains why it doesn't matter if the format is signed or unsigned, or if it is integer or float.

But if bandwidth is so important, why is GL_RGB not faster than GL_RGBA? That's because GPUs like 4 byte aligned access. The 24bit RGB data therefore gets padded to 32 bit.

I also tested the speed difference when using certain texture parameters, the results are as following:
When using (bi-)linear texture filtering the performance is about 5% to 10% worse compared to just GL_NEAREST.
The various wrap modes (GL_CLAMP, GL_CLAMP_TO_EDGE, GL_REPEAT, GL_MIRROR_REPEAT) hardly have any impact on performance. When trying out the different modes the difference was less than 1%.
Using Mipmaps can give a huge performance boost. It often ranges from 10 to 100 times faster. But it goes down fast if you start touching states like GL_TEXTURE_MIN_LOD, GL_TEXTURE_MAX_LOD and GL_TEXTURE_MAX_LEVEL.

In addition, if you want to dive in deeper I can recommend you this article. Page 10 covers texture cache more detailed, if you don't want to skim through all the text. It explains a lot of my findings.

Until now everything was mostly NVIDIA specific, as I tested only on NVIDIA hardware. But what about AMD GPUs or even mobile hardware? For recent GPUs that support at least OpenGL 4.0 or OpenGL ES 2.0 it will presumably be very similar. When using uncompressed formats, smaller pixel component data will always be faster. When using compressed formats it can differ. For example I have an 4 year old Netbook with OpenGL 2.0 support (Intel GMA 950), when I tried using compressed textures there, the performance dropped by at least 80%. It is slow as hell and is just not worth the memory saving.

The solution: If unsure it is probably better to use an uncompressed format with the smallest number of bits per pixel possible. If targeting OpenGL 4.0+ hardware you can safely assume that compressed textures will be fine too.

If you really want to go for the highest speed in any case, then it is probably a good idea to write a small performance test that evaluates the speed and compatibility of each format and later dynamically decides which formats to use. In this case it also becomes important to consider the quality and intended use of the textures, for example normalmaps should never be compressed using standard DXT1.
I might implement this into my engine soon and if I do, I will provide more info and code about it.

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

No trackbacks yet.