The Fastest Code Is Code That Doesn’t Run

Hello and welcome back to the Blightmare dev blog! Recently, we’ve been working on some performance improvements, and today we’ll take a look at a very common technique called culling. The basic idea is that if you can’t see something, then there’s no sense in drawing or animating it. Let’s dig into how this works.

Unity provides the CullingGroup API to help out with this problem. The general idea is to register a bounding sphere which is a location and a radius and then receive callbacks about when the sphere is visible or not visible from a specified camera. The details are a little bit more complex.

Registering multiple bounding spheres in a culling group requires the caller to supply the array of spheres. The caller must also specify how many entries from the start of the array should be processed. This allows the caller to pre-allocate storage during startup to ensure there’s no costly reallocation operations during runtime. A consequence of this design is that the active bounding spheres must be tightly packed in the array for maximum performance, and to avoid bogus results being reported. This property is easy to maintain when we’re adding additional spheres to check because we can just add them to the end, and the whole set remains packed.

Removing a sphere is a bit harder. The slot that the sphere we want to remove will create a hole in our data which violates the tightly packed property that we wanted to preserve. One option is to just handle the potentially bogus result by ignoring it, or set the sphere at infinity so that it’s never visible, and then keep track of where the holes are to be used preferentially by new entries. This is a free list scheme, and it means that we will always have active entries equal to the most entries ever active. In Blightmare, we’re much more likely to destroy an object than to spawn an object, and we start a level with close to the maximum objects possible. This means that we’re unlikely to ever fill the holes left behind by the destroyed objects and instead we’ll just pay the price for the holes every frame through the culling test.

Another option to ensure a tightly packed array after removal is to shift the entries in the array to fill the hole. The most obvious way to do this is to just shift all entries past the hole down 1, and then subtract 1 from the total size. This also preserves the order of the array. Shifting entries like this can be an expensive operation. In the worst case – removing the first element – it means shifting all the remaining elements. All of this cost is because we’re preserving the order of the elements. For our culling purposes, that property is not required. Instead, we can fill the hole with any entry. In particular, if we pick the last entry, then we’re just swapping the last entry into the hole, and shrinking our active set by 1. Now removal is a constant time operation – meaning the cost to performance does not depend on how many entries there are in the array. We’ve also gotten rid of any holes in our active set, so the typical culling operation will be as efficient as possible. Yay.

We’re not done yet, however. The CullingGroup API reports the index of the bounding sphere that changed visibility. This means that if we’re going to be able to correlate a sphere with an object, the index needs to remain stable. Our hole-filling algorithm does not preserve stable indices, so we have a problem. What we need is to have a stable index that we can return to calling objects so they can un-register themselves and otherwise interact with the data they have provided to the culling system. We can introduce a layer of indirection between the number we return to callers, and the indices we get from the CullingGroup API. In this scheme, the number reported to callers is called a handle. The way it works is, we maintain a table that maps the stable handle values to the changing index values. This table is updated every time an object is removed to reflect the swap that we’ve done. When a calling object wants to interact with the system, we can do the translation from handle to live index internally, and everything Just Works. In this case, because we don’t destroy objects very frequently, we actually store the object callbacks in a parallel array that is indexed the same as the bounding spheres. This means we can do a direct lookup when the culling state changes instead of having to look through a reverse (index -> handle) table. The cost for this choice is the maintenance of the parallel array during removal, but it’s almost always the case that operations which happen every frame should be optimized even if it increases the cost of operations which happen less frequently.

Have a look at the video below to see it all in action. Note that the camera view that you see is a debug camera that’s significantly zoomed out so that you can see the objects come in and out of visibility:

Thanks for reading! Please head over to our steam page and add Blightmare to your wishlist and give us a follow on Twitter. Have a great week!