Posts tagged “programming”
Kip Free code giveaway

I added some code to the site to improve the way source code is displayed, both in snippets in my blog and when viewing an entire file directly.  I’m adding line numbers on the server-side, but they are in a separate div so you won’t end up copying the line numbers if you copy and paste the code.  The syntax highlighting is all done in Javascript by prettify, an open-source library provided by Google.  After getting around two annoying jQuery bugs (one which is only present on IE6, and another which was fixed by upgrading to the latest jQuery level) I got it up and running, and it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
  /**
   * Returns 1/this.
   * 
   * @throws ArithemeticException if this == 0.
   */
  public BigFraction reciprocal()
  {
    if(numerator.equals(BigInteger.ZERO))
      throw new ArithmeticException("Divide by zero: reciprocal of zero.");
    
    return new BigFraction(denominator, numerator, true);
  }

Anyway, I figured I’d show off the full-file view with some source code you’re free to use if you want to.  You can find the BigFraction class I wrote in Java when working on some Project Euler problems.  The class represents a fraction as a ratio of two BigIntegers, so it will never overflow, and there will never be roundoff error.  It was fun to write, because it involved lots of math and manipulation of base-2 numbers (especially for the constructors which took one or two double variables), and I’m one of those weirdos who really likes those things.  This works in all the Project Euler problems I’ve used it in, and I’ve done some pretty exhaustive ad-hoc testing, but there’s still a good chance I’ve missed a bug or two.  Especially since I didn’t bother writing any unit tests or anything (hey, this is spare-time coding!).  If you use this and happen to find any bugs, let me know.  And of course, I need to close with:

Disclaimer: the BigFraction class is provided as-is with no warranty expressed or implied.  If you use it to calculate missile trajectories and end up nuking the wrong city, you shouldn’t come whining to me.

No Comments | Add Comment
Kip New site feature: searching

I finally got around to implementing a search feature on this site.  If you search for something in the bar over on the right side of the page, you’ll get a search over all the blog posts, photo, and photo albums on this site.

I did all this with the Zend Framework’s PHP implementation of Lucene.  It seems to give good results very quickly, although there are some issues I’m having that are either misunderstandings or outright bugs.  Or maybe it’s because I’ve been up till 4am fiddling with this thing.  Tomorrow I’ll try to simplify my scenario and see if I still get the same results... if not, then I guess I need to figure out where I’ve missed something.  I’ve found that the way you design a search index is quite a bit different from the way you work with a relational database, mainly because you intentionally denormalize data for the sake of faster searching, when you wouldn’t do that for a relational database (at least not until you found that some particular JOIN or something was a huge bottleneck).

Anyway, I need to get some sleep.  If you try out my search box and notice something particularly odd, let me know.  (Well, something other than the styling, which isn’t fully presentable yet.)  Most of the problems I’m having are with more advanced queries that aren’t working the way the documentation claims they should, but it shouldn’t be anything most users would ever see.

Update:  I figured out what was wrong, it was due to using the default query parser, when I needed to construct the query from APIs.  After writing my own query processor, all is well.  More info about the specific problem I was having can be found on this Stack Overflow question, which pointed me in the right direction.

No Comments
Kip Some thoughts on programming style

Time to discuss something controversial that leads many geeks to commit acts of heinous violence.  I’m talking about programming style!

So there are three basic ways to write if/elseif/else statements in most languages with C-like syntax:

Style A

1
2
3
4
5
6
if(condition1)
  statement1;
else if (condition2)
  statement2;
else
  statement3;

Style B

1
2
3
4
5
6
7
if(condition1) {
  statement1;
} else if (condition2) {
  statement2;
} else {
  statement3;
}

Style C

1
2
3
4
5
6
7
8
9
10
11
12
if(condition1)
{
  statement1;
}
else if (condition2)
{
  statement2;
}
else
{
  statement3;
}

Personally, I go for the most readable and maintainable code, so I use Style A iff all conditions and statements are trivial (by which I mean, they are short and fit on one line).  Otherwise I use Style C.  (Which you can see, for example, in that gradient-generator source I posted a few weeks ago.)

I’ve posted on my preference for Style C over Style B before, and I’m not going to focus too much on that here today.  However, I continue to see Style B promoted as the universal, be-all end-all solution.  It is recommended by Sun for Java, and by Zend for PHP.

The argument for preferring Style B over Style A usually goes something like this:

What if you come along and add a new line?

1
2
3
4
5
if(condition1)
  statement1;
else
  System.out.println("Condition1 failed!");
  statement2;

Now statement2 gets executed everytime!

The argument makes sense from an academic standpoint, but I am pretty sure this almost never ever happens in practice.  The reason for this is that anyone who has been programming for more than a month will immediately see that this code won’t work as designed.  It is a glaring bug that jumps out at you.  It is nearly impossible to overlook!  (Again, this is all assuming that the conditions and statements are all trivial.)

Now, for the last four and a half years my job has been primarily to fix bugs in a huge body of code, very little of which was written by me.  For the last two and a half years, in particular, I’ve been the guy who looks at build traces and unit test results from the previous night.  And when there are build or test errors, I have to look at recently changed code and decide who is responsible.  This means I have seen most of the errors made by a group of about fifty or so programmers.  That is to say, real-world errors made by real programmers, not hypothetical errors that might be made by a theoretical programmer.  So I think I am reasonably well-qualified to have a strong opinion on the matter.

With that in mind, I’ve never seen an error that was due to the use of Style A for trivial statements.  I have seen some downright ugly code that used Style A inappropriately, with conditions that were ten lines long.  And I’ve seen ugly code that has braces on the if block but not the else block (or vice-versa).  And I have seen several errors in Style B or Style C that result from intermingling of tabs and spaces.  This happens because simple editors like vi and Notepad render tabs that are up to 8 characters wide (per spec, I might add), but advanced editors usually break spec in favor of user-friendliness, rendering tabs at 2 or 4 characters wide (or whatever the user sets them to).  So if code was written using 4-character tabs in Visual Studio, then new code is added by someone using spaces in vi, and this happens back and forth a few times, you get indentation that jumps all over the place.  This is where I think Style C is better than Style B, because it is easier to find the matching open-brace because usually the opening and closing braces are written by the same developer with the same indentation level, even if the code between the braces jumps all around.  Also, with Style B, at a glance the code looks like indentation is wacky, since you have to actually read the previous line (or scan to the right end of the previous line) to find out if the developer actually intended the indentation to increase there, or if someone just increased indentation because they were using a different editor.

You can tell me why you disagree with me, and I am fully aware that there will never be agreement on this point; however, I’ll continue to avoid Style B until my pay is docked for it.  Someone has to stand up for what is right.

Kip Gradient update

If anyone decided to make use of the gradient generator code I posted yesterday, be advised that I just added alpha channel (i.e. transparency) support.  I’ve updated the source code to include these new changes.

Something strange I found out is that PHP only supports a 7-bit alpha channel, even though PNG (and really any image format supporting transparency) uses the same number of bits for the alpha channel as for the red, green, and blue1.  I’m assuming this is because PHP uses 32-bit signed integers, and if they let the alpha channel use all 8 remaining bits they would use the sign bit.  And heaven forbid people need to know a little bit about twos-complement.  Oh well.

And of course, if you use transparent PNGs, you should know they are not supported by IE6.  But I’m guessing any IE6 users out there are becoming increasingly aware of the fact that a lot of sites look strange for them.

1 Unless it is like GIF and only uses a 1-bit alpha channel
No Comments
Kip Little things

This post is to let you know about several small tweaks to this site that I’ve been working on lately, even though you probably don’t care at all. :)

One of the cooler things is that I’ve written some PHP code to programmatically generate gradient images.  If you’ve looked around the web you know that gradients are essential to modern web design, and I figure there’s no need to fire up Photoshop everytime I need one.  (Now if I can just write a glossy floor generator I’ll be totally web 2.0 compliant.)  You can view the gradient generator source code, if you’d like.  Of course, this kind of thing is so easy to do with PHP and a bit of 7th grade math that it’s almost not worth posting.  But I figured I’d share anyway.

Thus far I have put these gradients into action in two places on this site: as sexy new comment headers (as seen here, for example); and in the background of any picture in our photo album.

I can’t remember if I ever posted about this, but I wrote some Javascript a while back which is currently in use on the photos page, which scales the photo to fill your browser.  jQuery is awesome.  (And every modern browser1 can scale images without making them look grainy.)

I’ve also fixed the bug with the stored name/e-mail from adding a comment.  I have to apologize for the accidental breach of privacy, which would have exposed your e-mail address to other visitors to the site for up to an hour after your visit.  Now the name/e-mail fields are filled in by Javascript, so they are not cached server-side.

Another small change is that timestamps on posts and comments are now converted to your local timezone.  You can still hover over the timestamp to see an ISO-8601 timestamp, part of the datetime microformat I adopted when I added hAtom support.  Speaking of which, I finally found a way to validate hAtom: there is a site, transformr.co.uk, which will take a URL to a page supporting hAtom, and it will generate a true atom feed for it.  Here is mine.  I’m still not sure who would benefit from that though.  If you know enough to use the hAtom feed, then you probably know enough to click the little feed icon in the address bar too.  Oh well, it’s there if you want it.

1 I don’t consider IE6 to be a modern browser.
Kip What’s wrong with special characters?

Here is a message I got after logging into a website recently:

** NOTE ** Using a colon (“:”) in your password can create problems when logging in to Banner Self Service. If your password includes a colon, please change it using the PWManager link below.

Protip: If you are designing any kind of login/authentication system and you find that you need to give users a warning similar to this, you are doing something wrong.

On a much more nitpicky side note, why not just make “PWManager” or “using the PWManager” link to PWManager?  To their credit, at least they didn’t say “by clicking the PWManager link below.”

No Comments
Kip Code excerpt

Do programmers still need to understand pointers in this day and age?  Is Java a good programming language?  Why is there something as opposed to nothing?

Please do not indulge in heated debate pertaining to any of the above unanswerable philosophical questions.  Instead, just let me show you what can happen when someone has to program in Java without understanding two things about the Java language: 1) all parameters are always passed by value; 2) objects are basically pointers (whose address can only be modified by assignment).  Without this knowledge, you might write code like this1:

1
2
3
4
5
6
         //ABC-defect#123456 - SetChildren update currSel so we need to take bakup of
             //currSel so that after returning we can set it back to original currSel
        TreeNode tmpCurrSel = currSel;
    int result = setChildren(currSel,fullTree,0,tmpLevels);
       if (tmpCurrSel != null)
             currSel=tmpCurrSel; //ABC-defect#123456

Hint: it does not work as the developer expected it to (and obviously he never tested it after he coded it).

1 Code has been anonymized, but indentation and grammar is preserved for EnhancedRealism™.
No Comments
Kip Macrolicious

I recently came across a clever way of writing preprocessor macros, and I figured that I would share.

Let’s say that for some reason you need to write a macro: MACRO(X,Y)1.  You want this macro to emulate a function call in every way2.

Example 1: This should work as expected.

1
2
3
if (x > y)
  MACRO(x, y);
do_something();

Example 2: This should not result in a compiler error.

1
2
3
4
if (x > y)
  MACRO(x, y);
else
  MACRO(y - x, x - y);

Example 3: This should not compile.

1
2
3
do_something();
MACRO(x, y)
do_something();

The naïve way to write the macro is like this:

1
2
3
4
#define MACRO(X,Y)                       \
cout << "1st arg is:" << (X) << endl;    \
cout << "2nd arg is:" << (Y) << endl;    \
cout << "Sum is:" << ((X)+(Y)) << endl;

This is a very bad solution which fails all three examples, and I shouldn’t need to explain why.

Now, the way I most often see macros written is to enclose them in curly braces, like this:

1
2
3
4
5
6
#define MACRO(X,Y)                         \
{                                          \
  cout << "1st arg is:" << (X) << endl;    \
  cout << "2nd arg is:" << (Y) << endl;    \
  cout << "Sum is:" << ((X)+(Y)) << endl;  \
}

This solves example 1, because the macro is in one statement block.  But example 2 is broken because we put a semicolon after the call to the macro.  This makes the compiler think the semicolon is a statement by itself, which means the else statement doesn’t correspond to any if statement!  And lastly, example 3 compiles OK, even though there is no semicolon, because a code block doesn’t need a semicolon.

The solution is kind of clever, I thought:

1
2
3
4
5
6
#define MACRO(X,Y)                         \
do {                                       \
  cout << "1st arg is:" << (X) << endl;    \
  cout << "2nd arg is:" << (Y) << endl;    \
  cout << "Sum is:" << ((X)+(Y)) << endl;  \
} while (0)

Now you have a single block-level statement, which must be followed by a semicolon.  This behaves as expected and desired in all three examples.  I have noticed this macro pattern before, but I never really thought about why it was written this way.  Mainly because I don’t often write macros to begin with.

1 You should first ask yourself why you can’t just write a regular function and declare it inline, so that the compiler will do the work for you.  I’m going to assume there is some good reason why you must use a macro.
2 Every way, that is, except that it can’t return a value.  That gets much trickier and involves heavy abuse of the ?: operator, if it is even possible at all.
No Comments
Kip Doh!

I knew it would happen eventually.  I put in some code that broke our software, and it wasn’t discovered until nearly a month later, on the day the final build was scheduled.  This meant that the final build had to be delayed for a few days, which is kind of a big deal because it can affect ship date.  So lots of e-mails were circulated which featured my name—often in a red, boldface font—in various lists of actions.

Posted below is a paraphrased version of the code in question.  I’ve renamed or taken out anything that might refer to our internal codebase, and I’ve simplified a little, but not to the point that I look like a complete idiot for missing this.  The QueryInterface() and Release() stuff might look a little weird if you’re not familiar with COM+.  Or all of this will look weird if you’re not a programmer.  But odds are you’re about to stop reading if you aren’t a programmer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
LIST(IUnknown) listObjs;
Session->GetModifiedObjects(listObjs);

const int nbObjs = listObjs.Size();

if (nbObjs > 0)
{
   ObjectID** listObjIds = new ObjectID*[nbObjs];
   for (int i = 1; i <= nbObjs; i++)
   {
      ObjectID* pObjId = NULL;
      
      IUnknown* pUnknown = listObjs[i];
      IPart* pPart = NULL;
      if (pUnknown != NULL)
      {
        RC = pUnknown->QueryInterface(IID_IPart, (void **) &pPart);
        pUnknown->Release(); pUnknown = NULL;
      }
      if (SUCCEEDED(RC) && pPart != NULL)
      {
        RC = pPart->get_ObjectID(pObjId);
        pPart->Release(); pPart = NULL;
      }
      
      listObjIds[i] = pObjId;
   }
   ...
   //process listObjIds
   ...
   for (int i = 1; i <= nbObjs; i++)
   {
      if (listObjIds[i] != NULL) { delete listObjIds[i];  listObjIds[i] = NULL; }
   }
   delete [] listObjIds;  listObjIds = NULL;
}

For those of you still with me, maybe you already see the problem.  The LIST() macro in our infrastructure behaves pretty similarly to a Vector in Java: it will resize itself dynamically, check array bounds, and automatically free memory when it is destroyed.  However, because this was written long ago by guys with a Fortran background, the items in the list start at 1, whereas a C++ array starts at 0.  Also, it only works with components implementing IUnknown; plain-old-C++ objects must be handled with plain-old-C++ arrays.  In the code above, this meant I could not declare listObjIds as an object of type LIST(ObjectID*).  So I had a LIST(IUnknown) and an array of ObjectID*s, but I treated both as LISTs!  In fact, I have gotten so used to using LISTs in C++ that I completely forgot that listObjIds was an array (I guess a better variable name would have helped too).

The line listObjIds[i] = pObjId; should instead be written listObjIds[i-1] = pObjId;, since i loops from 1 to n, rather than 0 to n-1.  (Note that the line IUnknown* pUnknown = listObjs[i]; is still correct.)  So I was writing beyond the memory allocated to the listObjIds array.  And amazingly, it worked just fine in all my testing.  Most of the time, the next sizeof(void*) bytes on the heap aren’t going to belong to anyone.  But there is a chance that they are used for some other variable, whose value you would be overwriting.  This is especially more likely if memory has become very fragmented.

We run unit tests on all four operating systems we support, but only one operating system (HP-UX) was affected by this.  And since we don’t currently have any customers using that OS, it was a while before anyone looked at the traces very closely.  Unfortunately, it happens that this code was implemented in a listener that is called every single time the user saves.  So when it was discovered, it was something that had to be fixed before the final build.  We could have delivered it as a patch, but some customers are reluctant to deploy patches because that can mean shutting down production for a few hours.  Plus it doesn’t instill confidence to say we shipped broken code.  So delaying the final build for a day or two was the best option.

The worst part of it all is that it happened just before year-end performance reviews.  Doh!

Kip Array-casting in Java

Since I haven’t posted anything this week, I figured I’d share something annoying I discovered in Java: you can’t assume that you can put an object of type T into a T array (unless you happen to know that T is declared as a final class).

Take for example this code, which tries to put an Integer (which is an Object) into an array of Objects:

1
2
3
4
5
6
public static void main(String[] args)
{
  Object[] objects = new String[2];
  objects[0] = "ABC";
  objects[1] = new Integer(5);
}

This code compiles with no problem but when run it gives a runtime error on the objects[1]= line.  But if the array were declared as new Object[2]; it would run with no complaints.

The problem is that you’re allowed to cast an array of type T to an array of a super-type of T, but you don’t really have an array of the super-type.  I imagine they decided to allow this because of the usefulness of casting arrays to super-types for reading the data.  But it opens up a whole new set of bugs that most of the time you wouldn’t even think to check for (especially if the array is declared in someone else’s code).

Apparently C# has controversially included the same feature.

RSS feeds: Kip's - Stephanie's - Both