I knew it would happen eventually. I put in some code that broke our software, and it wasn’t discovered until nearly a month later, on the day the final build was scheduled. This meant that the final build had to be delayed for a few days, which is kind of a big deal because it can affect ship date. So lots of e-mails were circulated which featured my name—often in a red, boldface font—in various lists of actions.
Posted below is a paraphrased version of the code in question. I’ve renamed or taken out anything that might refer to our internal codebase, and I’ve simplified a little, but not to the point that I look like a complete idiot for missing this. The QueryInterface() and Release() stuff might look a little weird if you’re not familiar with COM+. Or all of this will look weird if you’re not a programmer. But odds are you’re about to stop reading if you aren’t a programmer.
LIST(IUnknown) listObjs;
Session->GetModifiedObjects(listObjs);
const int nbObjs = listObjs.Size();
if (nbObjs > 0)
{
ObjectID** listObjIds = new ObjectID*[nbObjs];
for (int i = 1; i <= nbObjs; i++)
{
ObjectID* pObjId = NULL;
IUnknown* pUnknown = listObjs[i];
IPart* pPart = NULL;
if (pUnknown != NULL)
{
RC = pUnknown->QueryInterface(IID_IPart, (void **) &pPart);
pUnknown->Release(); pUnknown = NULL;
}
if (SUCCEEDED(RC) && pPart != NULL)
{
RC = pPart->get_ObjectID(pObjId);
pPart->Release(); pPart = NULL;
}
listObjIds[i] = pObjId;
}
...
//process listObjIds
...
for (int i = 1; i <= nbObjs; i++)
{
if (listObjIds[i] != NULL) { delete listObjIds[i]; listObjIds[i] = NULL; }
}
delete [] listObjIds; listObjIds = NULL;
}
For those of you still with me, maybe you already see the problem. The LIST() macro in our infrastructure behaves pretty similarly to a Vector in Java: it will resize itself dynamically, check array bounds, and automatically free memory when it is destroyed. However, because this was written long ago by guys with a Fortran background, the items in the list start at 1, whereas a C++ array starts at 0. Also, it only works with components implementing IUnknown; plain-old-C++ objects must be handled with plain-old-C++ arrays. In the code above, this meant I could not declare listObjIds as an object of type LIST(ObjectID*). So I had a LIST(IUnknown) and an array of ObjectID*s, but I treated both as LISTs! In fact, I have gotten so used to using LISTs in C++ that I completely forgot that listObjIds was an array (I guess a better variable name would have helped too).
The line listObjIds[i] = pObjId; should instead be written listObjIds[i-1] = pObjId;, since i loops from 1 to n, rather than 0 to n-1. (Note that the line IUnknown* pUnknown = listObjs[i]; is still correct.) So I was writing beyond the memory allocated to the listObjIds array. And amazingly, it worked just fine in all my testing. Most of the time, the next sizeof(void*) bytes on the heap aren’t going to belong to anyone. But there is a chance that they are used for some other variable, whose value you would be overwriting. This is especially more likely if memory has become very fragmented.
We run unit tests on all four operating systems we support, but only one operating system (HP-UX) was affected by this. And since we don’t currently have any customers using that OS, it was a while before anyone looked at the traces very closely. Unfortunately, it happens that this code was implemented in a listener that is called every single time the user saves. So when it was discovered, it was something that had to be fixed before the final build. We could have delivered it as a patch, but some customers are reluctant to deploy patches because that can mean shutting down production for a few hours. Plus it doesn’t instill confidence to say we shipped broken code. So delaying the final build for a day or two was the best option.
The worst part of it all is that it happened just before year-end performance reviews. Doh!
February 15, 9:31 pm
Granted, I’ve been out of the C++ world for a few years now, but why is your company:
1) Not using smart pointers
2) Not using STL containers
February 16, 12:12 pm
1. We do. Nearly everything I work with on a regular basis inherits from COM+, so we get smart pointers for free. I’m not sure why the guys who wrote the database-layer code (where ObjectID comes from) didn’t use them, but my guess is they wanted to make the code as lean as possible.
I used COM+ pointers (which you could make the case are also smart pointers) in my original code because I had to call an API that took a pointer. But since I took that call out in the simplified code, I could have just as easily written the for-loop like this:
for (int i = 1; i <= nbObjs; i++)
{
ObjectID* pObjId = NULL;
IPart_var hPart(listObjs[i]);
if (hPart != NULL_var)
RC = hPart->get_ObjectID(pObjId);
listObjIds[i-1] = pObjId;
}
But managing/freeing pointers wasn’t a problem here.
2. Well a LIST(DataType) is pretty similar to an STL vector. When all the rest of the code uses LISTs, using an STL vector wouldn’t be a good idea. That’d just lead to more bugs exactly like this one, when the next guy comes along and assumes the vector starts counting at 1 like everything else they’re used to working with. (That was the problem here, I was so used to base-1 lists that I forgot I was dealing with a plain-old C++ array.) If the LIST had been written to start at 0 to begin with this wouldn’t be a problem, but it’s far too late to change that now.