游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

分享针对iOS设备执行纹理加载过程的方法

发布时间:2012-10-23 16:20:52 Tags:,,,

作者:Joe Woynillowicz

这能解决哪些问题?

由于内存的限制,我们在创造游戏玩法的过程中总是需要整合纹理内容。可能是出于复杂的3d世界的约束,开发者必须将对象顺畅地整合进关卡中,或者在我们自己的案例中,为了使用2d渲染,我们就必须整合进更多纹理(我们预先渲染了游戏世界,对象和角色动画的纹理集,并将其映射到布告牌上以获得更高质量的视觉效果,灯光以及更复杂的环境/角色)。

如果你的问题是以毫秒的速度在3d世界中加载纹理,你便可以使用压缩PVR纹理去解决这一问题。除此之外关于这一方法还存在其它更棒的性能优势。

如果你和我一样是为了呈现布告牌纹理去加载这些内容,你便可以基于无损耗模式,如PNG(最普遍的一种),BMP,TGA或未压缩的PVR去加载它们。基于本篇文章所提及的技术,你便能够通过改变你所使用的格式或加载方式而获得大量的性能优势。

步骤1:文件格式

如果你打算在一个3d世界中使用对象纹理,你便可以选择PVRTC格式,因为这是一种有损耗且固定的纹理压缩格式,同时支持4bpp和2bpp的ARGB数据。

而对于无损耗纹理,你的最佳选择则是未压缩的PVR文件格式。这一格式包含了PVR文件标题以及基于RGBA4444,RGBA8888或者BGRA8888等格式的数据。如果你使用的是32bit数据,你就没有理由使用RGBA8888格式,因为驱动程序将处理其中的cpu内容,这时候你可以预先将数据转变成BGRA8888格式,从而将其直接上传至内存中。

Apple_A6(from macdailynews)

Apple_A6(from macdailynews)

注释

–如果你使用RGB8数据格式去处理CPU,那便等于你在执行一些无用功,因为你必须为此添加一个填充字节,所以你必须确保始终使用RGBA/BGRA格式,即使你并不需要使用这一渠道。

–使用未压缩纹理格式意味着将占据更多磁盘空间。虽然这不会影响你的应用下载规格(IPA/APK则会压缩你的内容),但是在安装时却会占用更多的磁盘空间。如果你发现磁盘空间的使用已经超出了自己的预想,你便可以将纹理整合进.zip格式的程序包中,在初始加载时压缩你所需要的内容,并在应用终止时删除这些内容。

步骤2:文件I/O

在我阅读过的大多数网上文章或者别人所编写的代码,我发现他们都遵循着相同的纹理数据加载过程,即打开一份文件,阅读数据,根据需要压缩数据,然后调用glTexImage2d(…) 并明确RGBA格式。如果你是预先加载所有纹理数据,那么这种方法便非常有效,但是当你需要在游戏运行过程中整合纹理内容,你便会遇到一些严重的瓶颈。同时你还需要想办法避免一些不必要的分配/复制(基于格式可能会出现压缩)操作——可能会影响你在建造框架时的时间分配。

一种方法便是使用内存映射文件I/O。也就意味着文件内容并不是从磁盘中读取,并且也未使用物理内存,而是在内存空间经由OS进行缓存,并在需要的时候置入或置出。这可能会造成文件访问的延迟,假设你的内核加载页面所需要的平均时间为0.012毫秒(我甚至看过最长时间为0.135毫秒),但是如果它减少了内存分配/内存块拷贝(将占用内存页面加载时间),你便能因此获得较高的性能(更别说当你使用平台分配内存/自由调用时无需再担心内存碎片问题)。

为了做到这一点你可以遵循以下内容(为了简化例子我省略了错误处理):

#include < sys/stat.h >

#include < sys/mman.h >
#include < fcntl.h >
#include < unistd.h >

int32_t file = open(“my_texture_file.pvr”, O_RDONLY);
struct stat file_status;
fstat(file, &file_status);
int32_t file_size = (int32_t)file_status.st_size;
void* data = mmap(0, file_size, PROT_READ, MAP_PRIVATE, file, 0);
// Note this will not close the file/mapping right now, as it will be held until unmapped.
close(file);

// When finished with this data you call this to unmap/close.
munmap(data, file_size);

现在我们已经创造一种虚拟地图,并承诺只会在只读访问时使用这一地图,从而帮助我们获得了最优化的利益。

注释:

大多数情况下,内存映射文件只有在面对拥有比页面大小多出几倍规格(如4096字节)的大型文件时才能起作用,这也是为了避免浪费页面空间。显然很多情况下纹理都未能遵守这一规定,但是在我们的案例中这却不会造成多大影响,我们也总是能够获得最佳性能。

步骤3:纹理加载

首先你需要从文件中获得PVR标题结构(游戏邦注:包含了所有你所需要的信息)。如此你便可以使用4CC去核实格式,并获得所需的元数据。当你获得这一数据后你便可以将纹理数据加载到GPU以供使用。

以下方法能够帮助你轻松地做到这一点(为了简化例子我省略了错误处理,并在常量上做了假设等)

struct PvrHeader
{

uint32_t    header_length;
uint32_t    height;
uint32_t    width;
uint32_t    mipmap_count;
uint32_t    flags;
uint32_t    data_length;
uint32_t    bpp;
uint32_t    bitmask_red;
uint32_t    bitmask_green;
uint32_t    bitmask_blue;
uint32_t    bitmask_alpha;
uint32_t    pvr_tag;
uint32_t    surface_count;
};

static const uint32_t kPVRTC2 = 24;
static const uint32_t kPVRTC4 = 25;
static const uint32_t kBGRA8888 = 26;

PvrHeader* header = (PvrHeader*)data;    // data being your mapped file from step two
uint32_t pvr_tag = header->_pvr_tag;
// Here you would check the pvr_tag against the 4CC “PVR!” to verify

uint32_t flags = header->flags;
uint32_t format_flag = flags & 0xFF;

void* data_start = data + sizeof(PvrHeader);
if(format_flags == kBGRA8888)
{
// Note: I am assuming that you have already generated, bound, and set texture parameters.
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, header->width, header->height, 0, GL_BGRA, GL_UNSIGNED_BYTE, data_start);
}
else if(format_flags == kPVRTC4 || format_flags == kPVRTC2)
{
// You would do the same as above but using glCompressedTexImage2d(…);
}

步骤4:有效利用

这时候你便能够根据自己的需要使用最有效的纹理格式,你拥有映射文件,你可以使用内核去加载页面并进行复制而将数据上传到GPU。但是你应该不希望每次加载纹理时都重复这几个步骤。为了达到高效利用,你需要在最初加载游戏时映射所有纹理文件,而以此创建一个映射纹理文件缓存(游戏邦注:John Carmack发现在iOS平台上开发者只拥有700 MB的空间,如果你需要更多空间,就要使用mmap/munmap更谨慎地管理缓存)。当一个纹理需要你调用glTexImage2d并使用内核去加载页面,并基于原生格式将数据上传到GPU时,你就应该为其做好准备。当运行终止时,你可以通过取消所有文件的映射而摧毁缓存。

接下来呢?

根据你所整合的纹理数量以及它们的规格,你可以通过遵循上述解决方法而获得最佳性能,尽管使用iOS 6有可能帮助你获得更出色的性能。而对于那些愿意执行这些步骤的人来说,我敢保证他们将无需经历平常的绑定过程便能够有效地加载纹理数据。这种方法能够推动驱动程序发挥最佳功效,从而确保内存的有效管理,并避免内存分配或存储残片的出现。你可以创建更有效率的缓存系统并预先完成GPU的内存分配,而无需重复劳作。

本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦

Efficient game texture loading on iOS devices

by Joe Woynillowicz

What problem does this solve?

The main reason was a need for streaming textures during gameplay due to memory limitations. This could be due to having a complex 3d world and needing to stream objects into the level, or in our case it was having to stream a lot of textures for use in 2d rendering (we pre-render our world, objects, and character animations to texture atlases and UV them to billboards to achieve high quality visuals, lighting, and complex environments/characters).

If your issue is needing some milliseconds back, loading textures on the fly in a 3d world the problem is much less pronounced as you would be using compressed PVR textures, but there is still a great performance benefit to this method.

If you are in the same boat that I was in the latter case of needing to load them for texturing billboards, you are likely loading them from a lossless format such as PNG (the most common I’ve seen people using), BMP, TGA, or uncompressed PVR. Using the techniques outlined in this article you can achieve massive performance benefits by changing the format you are using and how you are loading.

Step One: File Format

If you are using your textures on objects in a 3d world then you will definitely want to use PVRTC format as it is a lossy, fixed-rate texture compression format that supports both 4bpp and 2bpp ARGB data.

For lossless textures your best option is to use the uncompressed PVR file format. This consists of the PVR file header along with your data in an allowed format such as RGBA4444, RGBA8888, BGRA8888, etc. If you are using 32bit data (like we do for the most part) there is really no reason at all to be using RGBA8888 as the driver has to process this cpu-side and you can easily pre-process your data to BGRA8888 for a straight upload to memory.

(Notes of Interest)

- If you are using an RGB8 data format you are making the CPU do even more unnecessary work as it will have to add a byte of padding so make sure to always use RGBA/BGRA even if you don’t need the channel.

- Using an uncompressed texture format obviously has the end-result of requiring more space on disk. This won’t affect the download size of your application (IPA/APK will have your content zipped therefore providing compression) but when installed will take more space on disk. If you find yourself using more disk space than you would like you can always supply your textures in .zip packages and at initial load time or as a threaded job decompress what you need and delete them when your application exits/terminates.

Step Two: File I/O

From most articles I’ve read online or code I’ve seen people write they are loading their texture data (either on their own or using a library like stb_image) by opening a file, reading the data, decompressing the data if needed, and then calling glTexImage2d(…) specifying the RGBA format. While this approach works fine if you’re loading all your texture data up front, as soon as you need to stream textures with the game running you’ll hit some serious bottlenecks. There are some unnecessary allocation/copy (and possibly decompression depending on format) operations that you can get rid of extremely easily giving you a noticeable impact on the time taken during the frame.

The way to avoid this is to use memory-mapped file I/O. This means that the file contents are not read from disk and so do not use physical memory, instead they are cached by the OS in kernel memory space and paged in and out when needed. This can actually add a little latency to file access, say your average is roughly 0.012ms for the kernel to load the page on access there are times I have seen up to 0.135ms, but as it reduces an alloc/memcpy (which cost well above kernel page load time) the performance gains are well worth it (not to mention you won’t need to worry about memory fragmentation if you are using platform malloc/free calls).

To achieve this you would do the following (to simplify the example I pulled out error handling):

#include < sys/stat.h >

#include < sys/mman.h >
#include < fcntl.h >
#include < unistd.h >

int32_t file = open(“my_texture_file.pvr”, O_RDONLY);
struct stat file_status;
fstat(file, &file_status);
int32_t file_size = (int32_t)file_status.st_size;
void* data = mmap(0, file_size, PROT_READ, MAP_PRIVATE, file, 0);
// Note this will not close the file/mapping right now, as it will be held until unmapped.
close(file);

// When finished with this data you call this to unmap/close.
munmap(data, file_size);

Now we have now created a virtual mapping and promised at the kernel level that we are only using it for read-only access which can give us optimization benefits.

(Note of interest)

In most cases memory mapped files are only effective with large files having a file size with a multiple of the page size (i.e. a multiple of 4096 bytes) in order to avoid wasting page space. Obviously there are times when textures will not adhere to this, although for our use case it has never been an issue and we have always achieved a net performance gain.

Step Three: Texture Upload

The first thing you want to do is get the PVR header struct from the file which contains all the needed info. This way you can verify the format using the magic 4CC and have all the needed metadata. Once you’ve achieved this you can upload your texture data to the GPU for use.

Below is a simple way of doing this (to simplify the example I pulled out error handling, made assumptions on constants, etc)

struct PvrHeader
{

uint32_t    header_length;
uint32_t    height;
uint32_t    width;
uint32_t    mipmap_count;
uint32_t    flags;
uint32_t    data_length;
uint32_t    bpp;
uint32_t    bitmask_red;
uint32_t    bitmask_green;
uint32_t    bitmask_blue;
uint32_t    bitmask_alpha;
uint32_t    pvr_tag;
uint32_t    surface_count;
};

static const uint32_t kPVRTC2 = 24;
static const uint32_t kPVRTC4 = 25;
static const uint32_t kBGRA8888 = 26;

PvrHeader* header = (PvrHeader*)data;    // data being your mapped file from step two
uint32_t pvr_tag = header->_pvr_tag;
// Here you would check the pvr_tag against the 4CC “PVR!” to verify

uint32_t flags = header->flags;
uint32_t format_flag = flags & 0xFF;

void* data_start = data + sizeof(PvrHeader);
if(format_flags == kBGRA8888)
{
// Note: I am assuming that you have already generated, bound, and set texture parameters.
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, header->width, header->height, 0, GL_BGRA, GL_UNSIGNED_BYTE, data_start);
}
else if(format_flags == kPVRTC4 || format_flags == kPVRTC2)
{
// You would do the same as above but using glCompressedTexImage2d(…);
}

Step Four: Effective Use

At this point you are using the most efficient texture format for your needs, you have the file mapped, and you can upload the data to the GPU by simply getting the kernel to load the page and copy. It should be obvious but these are not steps that you want to be doing consecutively each time you want to load a texture. For efficient usage you would build a cache of mapped texture files (storing the header and pointer to the data_start) by mapping them all when you are initially loading the game (John Carmack found that on iOS for whatever reason you only have about 700MB available so if you need more you will have to manage your cache more efficiently by using mmap/munmap with your job pool). When a texture is needed you glTexImage2d and the kernel loads the page, uploads the data to the GPU in native format, and you are ready to go. On termination you destroy your cache by unmapping all of the files.

What Next?

Depending on how many textures you were streaming in and their size you should already have a really great net performance gain by implementing the above solution, although with iOS6 you can go even further resulting in greater performance gains. I’ll save this as a topic for a future journal post, but for anyone that wants to implement this the addition I’m speaking about allows you to reupload texture data without having to go through the usual binding process. This allows the driver to work much more efficiently with managing memory and avoiding allocs, fragmentation, etc. You are able to build a much more efficient caching system and basically doing your GPU allocs up front and never having to do them again. If you are interested you can find a video regarding this (and other additions in iOS6 such as cheap nearly free programmable blending woohoo!) here (you need an Apple ID) in the video Advances in OpenGL and OpenGL ES. (source:GAMASUTRA)


上一篇:

下一篇: