The speed fallacy of unity builds

Recently, I’ve been asked this question several times: “why do you have such a problem with unity builds”. I figured that the question pops up so much that it’s worth writing up. Now I’m not the only person with this opinion and there is an excellent post from 2009, covering most of the problems with unity builds:

What I do want to touch on is the mistaken perception that unity builds actually improve builds times and thereby increase productivity. At first glance you would see a large improvement to build times when doing a full build or a rebuild. And that is true, they do improve full build times. The problem is that I as a programmer don’t do full builds that often. I tend to do a full build only whenever I get a newer version of a code base and even then, that’s often just an incremental build based on the delta between my version of the code base and the latest version. I very rarely ever need to actually do a “rebuild” or full build of a code base as part of my day to day work.

What I actually do on a regular basis is: edit some code, compile, test it and repeat. That is the core workflow of most programmers independent of their industry. That is the workflow that we need to be optimizing and unfortunately unity builds are a huge de-optimization of that. Now why do I say that? Well let’s look a simple example, we have a project with 50 code files and as part of our task we needed to change 2 of those files. In a non-unity build we would change those files then recompile just those 2 code files and we are done. This is illustrated below where the red blocks are the recompiled code files.

In a unity build scenario, we are required to not only rebuild the code files we changed but also the code files that are unified with the changed files. This is illustrated below where we have taken a very conservative approach to unifying the code files by only grouping 10 files together in a unit.

As you can see for the two changed files we now have to compile the two full units which include all the code from the included code files as well as a lot of headers and templates. This is not efficient especially in the cases where the code you are editing is relatively decoupled from the rest of the project as unity builds will break those de-couplings. And the above scenario is very optimistic since in reality, to see the huge “benefits” of unity builds you would need to unify much larger numbers of files into the units. In previous experience, we had 100+ files lumped together into units. This meant each time I edited a file I recompiled the code for 99+ other files. THIS IS NOT FAST! In fact this is the exact opposite of fast, it is extremely frustrating and a massive waste of a programming time.

An Aside: Unity builds also tend to break a specific workflow of mine that improves my productivity. They prevent the individual compilation of code files. I tend to compile each file individually once I’ve made enough changes to them that I am happy and feel that I’ve reached the end of that iteration. The individual compile helps me to verify that my “includes” are correct as well as any template invocations I might have in the code. I do this via the ctrl-F7 shortcut in Visual Studio. So basically I tend to work on a file and then compile it once I am done, then I move onto editing the next file. I do this so often that it has become a nervous tick for me alongside my almost OCD level of hitting ctrl-s without noticing. The end result of this specific workflow is that I end up manually distributing the compilation of a given change over the course of my edition time. This means when I’m finally done with my changes to the project, I often don’t need to recompile much as I have already compiled everything (unless of course I’ve changed a header). This workflow is really nice since for a lot of day to day changes I don’t change public interfaces that much. Unity builds completely break that for me. Since now the code files are just references that go into a unit once you build the actual project, you can’t compile them individually anymore and in the cases where a build engineer has developed a workaround for that to work, the compiled objects aren’t used for the final build so it’s just extra useless work.

The unnecessary recompilation is a known problem with unity builds and one that even the unity build proponents will admit to. They do have a “solution” that “solves” the issue. Their solution extracts the edited file out of the unit once you edit it into a “working” unit. This means that subsequent edits to those files will only require a rebuild of the work unit. This does improve the situation slightly but it comes with its own set of problems. The first one being that it only helps the cost of the subsequent edits. Imagine that I don’t know in advance all the files that I need to edit, each time I edit a new file, I will have to recompile not only the work unit but also the original unit that contained the file.

So yes, while this does offer an improvement to iteration times over the previous approach, we still have to pay a significant cost each time we edit a new file. And I don’t know about you but I often don’t know in advance exactly which files I will have to edit and so I end up paying these costs. I will say that this approach does have one additional benefit over the previous unity build approach in that in moving the files into a smaller work unit you can validate that you are not missing any includes that might have gone unnoticed in the larger unit (note: this only kinda works since potentially the order of the cpp includes in the work unit might once again hide an inclusion issue).

This brings us to the second problem which is that the code you are compiling locally is not the same as what the build machine is compiling and this can result in frustrating issues since you will be generating a different set of compiled objects than the build machine. Obviously, you can then create a mechanism that once the code is tested, you can locally generate the same units as the build machine and then test that but this means practically a recompile of the project since potentially a lot of units will be affected and it obviously adds a completely unnecessary extra compile and testing phase (and cost) t to your development. And god forbid you find an issue that exists when merging the units, cause then you might as well join the circus with all the experience of jumping through hoops you’ll be getting.

So these are the relatively obvious issues with unity builds, so how about we cover a less obvious issue, one that is a real iteration killer: header files. Imagine the project setup highlighted below, we have a set of header files that are included by several code files across two projects. This is a pretty standard setup.

In a normal setup, if we change a header file in project A, then we will need to recompile the dependent code files in project A and in project B. So the change of the one header only results in the recompilation of 5 files. Now imagine the unity build scenario, we don’t really have control over how we group our code files into the units so we may end up with the following setup:

Now instead of having to recompile only the five files that depend on that header, we have a massive amount of work to do. And this problem only grows the lower down the header is in your project hierarchy. Even if you were extremely careful with your include patterns and limited the inclusion of a header to only the bare minimum number of files needed, it doesn’t matter unity builds will throw away all your hard work. In fact, you might as well not care about your inclusion patterns since that header will get included in a ton of places. In fact, in previous experience, in changing a lower level header, I ended up literally recompiling more than half of the entire codebase, this is simply insane. I almost see unity builds as a tool for lazy programmers that don’t want to have to think about their code architectures and include patterns. As far as I know there is no “solution” to the above problem. I can imagine trying to build up a database of all the code files in the code base, have a list of their include files, then do some sort of topological sort (assuming no cyclic dependencies between headers exist – LOL) to try to group the code files in units that minimize the unnecessary includes and so on. Realistically though, that’s not really feasible because to get the low granularity of code files needed for an effective unity build, you will have to have a massive amount of redundant includes.

I was previously working on a project where I had created two new projects (one dependent on the other) for my systems, I avoided using unity builds for them and enjoyed a good iteration time especially compared to my colleagues that were working in the rest of the codebase. Then an over-zealous build engineer moved my projects over to unity builds and I went from 5~10sec build times for a change to 45s-1m per change since now I pretty much ended up rebuilding the dependent project every time I changed a header in the base project.

When complaining about this, the build engineer just looked at me and said that yes it’s true that my iteration times are much worse now, but look at how much faster the projects compile on the build machine. I had no idea how to (politely) respond to that. From my point of view, the production costs in terms of hamstringing your entire development team’s iteration times greatly outweigh the cost of a slightly slower build on a build server.

And this is one of the biggest problem I’ve experienced in terms of dealing with the proponents of unity builds. Simply that they tend to only look at the rebuild times for the whole solution and completely forget that people actually work in that solution and don’t constantly rebuild it. Yes, unity builds really help the overall build time but they also really fuck up iteration times (not to mention that they actually break language features i.e. translation unit locals and anon namespaces). And I really hope that people will stop and think about what the cost in terms of productivity is, before blindly moving over to some sort of unity build system.


3D DirectX10 Free Look Camera (Timer based)


Okay so I promised I’d write a tutorial on writing a simple free look vector based camera, this tutorial doesn’t only apply to DirectX10 but to  pretty much any graphics API. We want to keep things simple initially so the simplest possible camera we can implement short of a first person style camera is a free look camera (without roll) so basically only two degrees of freedom: left/right and up/down, we are also going to implement some basic movement controls forwards/backwards and Strafe Left/Right. Continue reading “3D DirectX10 Free Look Camera (Timer based)”

Multiple High Resolution Timer Class for Windows – C++

Earlier i had posted a tutorial on doing high resolution timing on windows and i’d given a very basic class to act as a timer, now i found when doing my graphics work i often needed multiple timers and so i just extended my baby timer class a little to support multiple timers.

So you can create as many timers as you need on the fly (the controller class acts as a big timer vector) and also get a pointer back for each timer if you wish to create a easily readable shortcut ie.

HRTimer* animationTimer = timers[5];

hope you find it useful: MHRTimer.h

Discrete Pulse Transform!

OH MY GOD! its finally done! You have no idea how happy i am right now… (and pretty drained)

I’ve spent the last three days debugging the conversion i did from matlab to c++. The problem was that the program would complete but the image wouldnt have been totally cleared (ie. be a fixed value throughout). Okay i’m getting ahead of myself. Basically the DPT algorithm (which we still have to name) seperates an image according to the contrast grouping. Its basically an intense smoother.

It starts out by identifying all the local minimum and maximum pixel groupings, it does this by recursively finding all pixel groupings and each groupings adjacent pixels (border pixels), it then compares the pixel groups color value to its adjacent pixels, if it is larger or smaller than all these values, then the group is a maximum or minimum respectively.

This seems to be the most intensive operation in the algorithm , i havent had time to slap it through a profiler yet so i cant be sure of this.

Once this initial “search” is complete, we start extracting “pulses”/pixel groups of a certain size. We start at size 1 and continue until there is only one group left (ie, the entire image has been smoothed down to a single color). When we extract the group, we check if it is a max or a min group, if so then we set the value of the pixels in that group to the max or min value of the adjacent pixels, and then we “grow” to see if it will now expand to the neighboring pixel, it may even overlap with another grouping, if this occurs we simply merge the two groups. By setting the groups pixel to a value neighbouring it we’re always guaranteed that the group will grow, so we are always guaranteed to full smooth an image.

Max groups are called “up pulses” and min groups are “down pulses”, once the image has been cleared we can then use these extracted “pulses” and reconstruct the image from the pulses, either full or partially. I’ve provided sample output of the program below, i entered in an image and then did several partial reconstructions with the extracted pulses.

Original Image

Partial Reconstruction: Grouping size 200 – 350

Partial Reconstruction: Grouping size 350 – 800

Partial Reconstruction: Grouping size 1000 – 2000

Partial Reconstruction: Grouping size 2000 – 35000

Partial Reconstruction: Grouping size 35000 – 60000

Partial Reconstruction: Grouping size 35000 – 85000

Kinda cool huh? Anyways the really funny thing is now that the algorithm is complete, we need to figure out the application for the algorithm. Kinda like developing a cure for a disease that hasnt been discovered yet. Should prove fun…

The next step for me know is to develop a win32 application that will function as a work bench for this algorithm so that we can easily see the results, i honestly have no idea how complicated it will be to embed my ANSI c++ code into a visual c++ program and so that will be my next challenge. Keep an eye out.

Also since this is now complete (i was envisioning debugging code for most of the weekend) I can now spend a large chunk of the weekend working on the design document for our mod. That’s gonna be a lot of work but work that i’m looking forward to.


Radon Transform

Wow, it’s been a rough couple of weeks: I had to hand in my graphics project, study for a statistics test, fighting off my allergies (I hate spring) and then I had to study for my finals. At least I have my degree now.


Anyways, I promised I’d write about the radon transformation I used to convert from the extracted images to a numerical format suitable for input into our neural network. This technique is extremely effective and is already used in industry for just such purposes.  We tested it on demo day with very minimal data and it worked remarkably well.

Before I get knee deep in the technical aspects of the system, I need to mention this: due to the preprocessing done on the motion detected, there is no need for a complicated AI system; the radon transformation and the recursive feature extractor together remove a lot of noise and problems that may have been present otherwise. The radon transform especially helps as we have built in scaling so this does not have to be taken into account later on. Also from the results of the transformation, objects similar in shape have extremely similar radon transformations so the training time of the neural network was reduced as was the amount of hidden neurons necessary.

In the final demo we used a neural network with 408 inputs and only 4 hidden neurons, scary isn’t it. 😛


Now back to the nitty gritty: the radon transform. If you Google the “radon transform” you’ll probably get the Wikipedia page with a scary looking equation.  I also got a fright the first time I saw this but after some research it’s really simple.

The basic idea of the radon transform (or my modified version thereof) is simple: if you look at your 2D image in the XY plane, you simply flatten the image onto the X axis (figure1), then divide the X axis into several beams and you work out the amount of pixels within each beam. Your output will be the pixel contributions of the object to each beam.  Then you’d rotate the object and flatten it once again. Doing this for multiple angles will give you a very good representation of the objects shape.


Figure 1

The most basic (and unfortunately most commonly used technique) for image classification is to simple get the centroid of the object and then trace the outline of the object giving you a silhouette. Now this doesn’t sound so bad does it? Well, it is firstly it doesn’t handle broken up images well (not without major preprocessing or modification) and it also loses a lot of detail and can provide false matches. In figure1 below we have the radon transform of a solid circle and a hollow circle, a standard outline trace would provide the exact shape result for these obviously different shapes while as you can see the radon transform (in one projection) provides completely different results.  Again this pre-processing will take the strain of the neural network (or other AI technique we’ll use for classification).



Figure 2

Okay now for the technical details: as you remember we flatten the image according to some projection. Figure2 shows some of these projections. Now if you look at figure 2 you might notice that the now flattened image’s top border can be seen as a graph of some function, so the amount of pixels in a beam is the approximate area under the graph between the left and right end points of the beam.

Now that picture is misleading as you might think that that it is a square object that we’ve rotated and flattened, but it is in fact a single pixel. The algorithm works on a per pixel basis. Instead of actually flattening the object, we simply work out the equation of the graph for a single rotated pixel and then use that to run through all the pixels in the object, work out the left most and right most and then add them to their respective beams.  

Now some of you are screaming that if we just rotate the pixels it will be wrong as we aren’t rotating the entire object but that is taken into account later on.

Now how do we calculate the area under the graph for each pixel and how do we figure out what beam to add it to since a beam will have lots and lots of pixels in it? Also the beam widths will differ per object.


Figure 3

What we do is simply divide the beams into lots of sub-beams, so that multiple sub-beams pass through each pixel. Then for each pixel we work out the left most sub-beam and the right most sub-beam that passes through the pixel. This then becomes the domain for the equation of the graph we have earlier and we loop through each sub-beam, calculate the pixels contribution to it(the area under the graph) and then add it to the sub-beam total. This is shown in figure 3. What you also notice from figure 3 is that there is a small degree of approximation to reduce the calculations required for the area, but remember that we’re talking about fractions of a pixel here so the total error in approximation can easily be ignored.

Now for each projection we run through each pixel and add it to the appropriate sub-beam. Once this is complete we sum the sub-beams up into the initial amount of beams and then we divide each beam by the scaling factor. The scaling factor is simply the total pixels over the beam width; this reduces the total area for the beams to 1. So every object gets reduces to an n-beam representation where the sum of all beams is equal to 1.

Okay, my explanation is very basic and I’m sure mathematicians would point out various mistakes and so on , but I’m trying to make this easy to understand and to follow, it is not meant as a 100% mathematically accurate explanation, obviously if you wish to implement something like this, you wouldn’t only use my guide here as a reference. I’ve also left out some details but they should become apparent from the below explanations.

I’m struggling to find a good way to structure this guide so I’m just going to run through the algorithm simply just to finish off. 


The first steps we need to take before we can process the object is to get the total number of pixels, work out the centroid and the approximate radius of the object. Using the radius we work out the amount of beams and sub-beams we need for the transformation. Remember that we want several sub-beams to pass through each pixel.


Now we run each of our projection functions to calculate the sub-beams totals. I’ll run through the basic procedure for a  projection:Work out the center of the pixel on the new axis (this where the rotation of the object comes into play)Work out the left most and right most sub-beams that pass through the pixel.For each sub-beam add the pixel’s contribution to it.

Note: For some equations there is an incline to the graph and so this needs to be calculated too, and processed separately. I.e. Work out the left most and right most sub-beams for the increasing incline and the decreasing incline and then using that work out out all the section separately.

Combine Sub-beams:

Now once all the sub-beams have been calculated, we work out the scaling factor which is: beamWidth /numPixels. We then sum all the sub-beams into beams (per projection) and multiply each one by the scaling factor. And that’s it. We have our complete numerical representation of our image.

Note: I used only 8 projections as I had very limited CPU time left at this stage of the project and had to limit the amount of processing that needs to be done, obviously more projections will be better but then again too many would be worse. A fine balance needs to be found, I personally think that 8 projections are more than sufficient for my needs. Again GPGPU programming would be so useful here!

C++ Source Code:

Feature Extractor – Initial Results

Okay, it’s been a while since the last update. I coded the outline extractor as I discussed in my previous post. It works for small objects but once most of the screen or lots of little objects occur it crashes. I think its a memory allocation problem related to my use of STL vectors. Anyways It turn out that it might not be the best method to use. Especially due to the image reconstruction problem, if I just get a list of little items (possibly hundreds of them) doing a compare between all of them will be extremely expensive and not too mention quite complex as a simple vertical/horizontal distance compare wont work.

What I am now using is a recursive extraction method that extracts the entire object even if it is broken up by using a neighborhood check in each step. Simple put it searches for a white pixel in the motion frame, once it is found it creates a new object and adds the pixel’s position to the object. Then it finds that pixels neighbors which are white and adds them and for each of those neighbors it finds their neighbors that are white and adds them and so on. The trick in this algorithm is in the finding of the neighbors…

This is done by checking all the pixels around a pixel within a radius of r pixels. so for a radius of 1, it will check the 8 surrounding pixels ( the neighborhood if you will 😛 ) . A radius of 2 will increase this neighborhood to 24 pixels, 3 will make it 48 pixels and so on. By modifying this radius I can now extract whole objects which are made up of lots of closely spaced sections. The below image illustrates this technique, note that the blue dot is the starting point.

Recursive Extractor Radius Effect

This technique extracts multiple objects reasonably quickly assuming that they are not too large. The cost of the algorithm increases dramatically when the number of separate objects is increased or the size of each object is increased. It also randomly crashes when the entire screen is white, I’m assuming it’s a memory allocation error and will look into it.

Pseudo code for the algorithm is as follows (assuming you’ve already found an initial white pixel):

using radius: 
    calculate range (rowStart,rowEnd,colStart,colEnd) for checking   

for ( r = rowStart, r < rowEnd, r++ ) 
    for ( c = colStart, c < colEnd, c++ ) 
        if ( pixel(r,c) = white ) 
            add it to the object 
            set pixel to gray 

if (neighbors are found) 
    foreach ( neighbor found) 
        //recursion occurs here 
        add its neighbors to the object 
    return empty neighbor list 

I tried optimizing this entire procedure by avoiding the calculation of the neighborhood on each step by simply creating a massive lookup table with each position and its neighbors and just looking it up instead of calculating it but much to my surprise once i profiled the code, it was significantly slower to lookup the neighbors than to calculate them?! I have a feeling its got something to do with internal copying of vectors when I return the neighbors. I guess this weekend is gonna be spent staring at a profiler again.

Now once the extractor is complete, I have to transform the object into a suitable format to input into my neural network, i was thinking of the centroid method i described in my previous post but my dad feels it would be much better to run the object through a radon transform and using the resulting data for the neural network. I’m still a bit away from that right now but once i get there I’ll let you know know what i find out.

Update: c++ source code – recursiveExtractor.cpp

Feature Extractor Intro

I’m currently busy working on a feature extractor for my final year project. I’m doing an outdoor surveillance system with shape classification. Currently I’m using a sigma delta background estimation algorithm (removes areas of constant motion from being picked up) to create a background and variance frame and then using those two frames and a difference frame (current frame – background frame), i get a resultant motion frame, the upper right frame in the below image.

Feature Extractor - Background

What i need to do at this stage is “simply” extract the resultant silhouette, convert it into a numeric format: i intend on doing this by finding the centroid of the silhouette and get the lengths of a lines from the centroid to the edge in intervals of 2 degrees. Then I’ll take the 180 lengths I have and normalize them to remove any difference in scaling and then run them through a back-propagation neural network (BPN), first to train it and later for the actual classification.

Now the problem is the extraction of the the silhouettes. If i assume only a single object will be present its easy but as you can seen even a single object isn’t guaranteed to be represented by a single silhouette, due to the nature of current gray scale motion detection algorithms you tend to get a lot of breakage in the detected motion. So i have to extract and reconstruct these silhouettes. Sounds simple? Its not…

I’m running short on time and sitting and reading through stacks of journals for ready made algorithms which I’m still going to have to modify isn’t going to help, so with a bit of brain storming with my dad, we came up with a silly, simple solution. Anyone remember the classic computer science problem: traversing a maze? The solution? The right hand method, stick your right hand against the right wall and walk until you reach the end. I’m going to use this method to trace the silhouette outlines and then do a horizontal and vertical comparison on the separate silhouette outlines and then try a uninformed reconstruction. This is all just an idea right now and i’m not even sure if it will work and if it does whether it will be good enough to use in the final project.

I should actually get back to working on it. Hopefully by the end of the weekend, I’ll have more details and source code of my results. The worst part is that I’m not sure i have the processor time available for this technique, the current motion detection is expensive enough as it is. On my home X2 5400, it takes around 60-70% on the one core.