A Tale of Two Refactor: Part 5

Hello and welcome to another installment of our ongoing series covering the two major code refactors that have happened (so far!) during the development of Blightmare. Last week in Part 4, we made the jump from traditional Unity MonoBehaviour components into fancy new Entities API components. Today I’m going to discuss our experience with this new API and prerelease features in general.

Things that went well

I think it’s appropriate to start with the things that went well because for the most part they happened first, and for whatever reason, the expression is “pros and cons” not “cons and pros.” Anyway, the immediate concern that precipitated refactoring all the code was how difficult it was getting to make a change and be confident that it wasn’t going to break something else. Data-oriented programming paradigms are largely structured to help isolate logic into functional units with very explicit boundaries for observable change, and an ECS fits into that umbrella, so we expected to come out ahead in this category.

As hoped and predicted, converting all our MonoBehaviours into stateless “systems” and logicless “components” forced some of the more problematic areas to be re-thought into more explicit, observable forms. Instead of relying on certain internal flags or nullable options, divergent logic was promoted to its own system and in general normalized the game’s data which helped to simplify individual features or functions. This made it much less scary to change something and not be worried that something unrelated would break as a result. This in turn made it more efficient to iterate which is absolutely critical to making a good game. Things were good.

One other major benefit that materialized with the ECS version of Blightmare is that control over the execution order of the game’s logic became a lot easier to control and observe. It is possible to specify the order of MonoBehaviour executions within Unity, but that setting is hidden away in an editor panel and the UI is difficult at best. Furthermore, there’s no great way to disable an entire class of component or change the ordering at runtime which are both not only possible, but easy with an ECS. Being able to control the logic order however I want allows me to implement mechanics entirely in the frame they should take place instead of having to queue up events or defer things for processing on the next frame. Either approach can totally work, but being able to think about a frame as the definitive logical unit helps me keep things straight.

That’s quite a bit of very impactful good things, and they were largely evident right from the start so things were excellent immediately following the refactor. I had been quite nervous about spending the time to rebuild everything, but it looked like the cost was more than worth the long term benefit of less bugs, faster iterations, and simpler code. Unfortunately, the problems took a while to show themselves.

Things that went badly

With the benefit of hindsight, it’s hard to even know where to begin with the problems we had. Let’s start with controlling the order of logic. I just got done explaining how this was so easy to use and solved some problems before they even happened. As it turns out, that isn’t the whole story. Unity (and many other software frameworks) likes to rely on a lot of functionality that happens automatically behind the scenes to make things “easy” for the user. An example of this is the mechanism that actually invokes a function like Update in a MonoBehaviour. As a general rule, I hate things like this – I typically refer to them as “magic.” The problem that I have is that it’s impossible for me as the programmer to change how such a system works. My problem has to fit into their solution, even though there is no way that the author of the framework had ever actually thought about my specific problem.

Returning back to the problem of order-of-execution, the “magic” in the Entities API to allow a System to participate in execution was a series of C# Attributes to tell Unity how and when it should execute the code. This was a constraint based system where you expressed dependencies between various pieces of logic and Unity figured out an order to run the code in. I was interested in a specific ordering, so I chained dependencies together one by one to create my game loop. Other than being verbose and tedious, it worked out okay at the start. Once we had many Systems – maybe 50+ – we started to see very slow startup times for the game. We’re talking 15-20 seconds or more. Every time you tried to run the game. Add that time to the relatively long compile times, and there’s a significant amount of delay between making a change and testing it. This kills my iteration speed, so I had to do something else. What I did is wrap my logic inside container Systems that then just called things in the order I wanted. I had a “container” for each phase of Unity that I needed logic, and we were back to being fast. This cost me several days of debugging, googling, and then just code transformations, which were entirely just to work around a “feature” of the engine.

The next problem should have been identified as critical as soon as we saw it, but at least we won’t make the mistake again. The Entities API was such a fundamental change from how Unity had been implementing things internally that it didn’t really work with the rest of the engine. This was especially a problem with trying to initialize objects from a scene. Typically you would setup some prefabs and either just have instances of them in a scene, or spawn them at some point during play. The original mechanism to translate from Prefab to Entity was actually the best one if you ask me. Essentially there was a function that you had to implement which was given the prefab, the new Entity and an EntityManager and you then had to do whatever was required to setup the Entity to be an “instance” of the prefab. This is the kind of API I like because it gives me as much control as I want. Building utilities on top of this would be great because there’s many types of conversion that are common and it’s tedious to do that myself. However, because components are structs in the Entities implementation, they cannot contain “non-blittable” types. Essentially that means that any complex type cannot be in a component. This includes booleans, strings, containers, references to assets, etc. I had worked around this in various ways in my custom conversion functions where required, and life was fine. However, Unity decided that they would make things easier for their users and do the conversions “automatically.”

You may be able to guess from my previous statements about magic, but this didn’t work well. There were 2 problems: the functionality was hidden from me, and the default implementation was incredibly inefficient. Combined, these two factors eventually led to the strongest argument to refactor again, but I’m getting ahead of myself. What we observed was upwards of 20ms to instantiate a single prefab in our game. The steps that the “magic” went through to do this involved creating a new temporary World, creating an Entity in that world, traversing the properties of the original prefab to initialize various components on the new Entity and then finally making a copy of the World data and migrating it into the main game World. This was a fully generic process, which is to say it technically worked on all prefabs, but it was incredibly wasteful and made the game unplayable. I tried a couple of workarounds to this process in an effort to trick Unity into doing something intelligent, but when you’re fighting your tools, it may be time to change tools.

Continuing on the topic of performance, there’s one last major issue we faced that I will cover before wrapping up this post. The implementation that Unity went with for their ECS automatically handled component storage and organization for you. This was not configurable or observable in any way to a user. The general way that it worked is it looked at the set of components attached to an entity and called this the “archetype” of the entity. Entities with the same archetype were stored together in chunks of memory. This had the nice property of keeping components for sequential entities of a given archetype tightly packed in memory, which results in excellent cache coherency. This should be great, because it’s actually quite difficult to achieve good cache coherency in complex systems such as a game. However, the side-effect of implementing the system this way without any option for a user to customize, is that whenever an entity changed archetype – by adding or removing a component – it had to be relocated into different storage, and the gap that it left had to be filled by swapping it to the end. This results in 2 complete entity relocations for each component change. The documentation made it clear that this was a known characteristic of the design, but “because memcpy is very fast” it wouldn’t be a problem.

Spoilers: It’s a problem. This design brought me to a decision where I could restructure my logic to fit this highly rigid system, or I could not use the system. It’s a perfect example of a failure in a pre-prescribed solution to fit my problem. The most frustrating part of all is that I know how to solve this problem for my needs, but I’m just simply not allowed to do that within the provided API. In the end, this means that we will have to give up the advantages of using an ECS because the problems were insurmountable. I want to stress that this is not an inherent deficiency in the ECS paradigm, rather the problems lie in Unity’s implementation.

Okay, that’s probably enough ranting for this post. Full disclosure: the Entities API was marked experimental when we tried to use it, and I haven’t been back to see if things are better. I will accept some blame at expecting something “experimental” to be more stable and generally work more than I should. I suppose some lessons have to be learned the hard way. At any rate, next week I’ll walk through the second refactor which had the benefit of hindsight and explain some key decisions that were made differently.

If you’re enjoying this blog or want to find out more about the game, please head over to Steam and throw us on your wishlist. You can also follow us on Twitter to get all the latest updates. Thanks for reading and see you next week!