Wednesday, August 25, 2010

Continuous Integration

We used continuous integration in my department for a year and wanted to share some feedback. First just to summarize I absolutely loved it. I tend to think about the time before we were using it as the dark ages and now we are in the renaissance of development. We used Hudson for our continuous integration server. We used the version prior to the fork with Jenkins and I'm not sure which I'd opt for now. I'd probably go with Jenkins if the community moved there. I may also refer to it as our build server although it really does a lot more than building.

Benefits:
1) Automatic feedback when somebody breaks the build.

This is really important to get that immediate feedback so somebody can go in and fix things right away. No more waiting for days and then discovering somebody made a change and everything is broken. This was immensely helpful with our developers in India as they regularly forget to check in files. If they don't leave things in a stable state they will get an email that the build is broken.

2) Automatic running of unit tests.

Some developers are better at writing unit tests than others. Some don't bother to run them after they make changes. Then when the application went from development to maintenance who knows if the tests were ran or updated.
With a continuous integration the tests get ran automatically every time a build runs so you get an email if you've introduced something that goes against the set behavior outlined in the unit tests.

3) Standardized project configuration and setup.

Every application we created was basically up to the person creating them as to the folder structure and conventions used. Once you start putting things on the build server you see the need for a standardized folders and conventions. User libraries get standardized. This helped us in the fact that once you have a job setup for a project you have standard setup for an application to refer to. Less issues with it working on one persons machine and then not another persons machine.

4) Automatic deployments

We had the capability to automatically deploy the EAR files directly to our developer and staging environments. This is a huge time saver and reduces errors. There was always people exporting EAR files from our IDE, deploying them, and they wouldn't work. Just repeated exporting and deploying can eat up time. It's just nice that if the build is successful and passes unit tests that it is auto deployed. This also helped with people working remotely from home as it was slow to deploy when connected to VPN.


5) Metrics

The reports you get out of Hudson with plugins are great. Unit tests pass/fail, code coverage, bugs.
We utilized FindBugs, PMD, Task scanner, and Checkstyle plugins. We also created a couple custom ones to suit our specific needs.
The FindBugs and PMD bugs are great in that you can find common coding mistakes and critical errors like thread safety problems automatically. They let you navigate to the line of code with issues and then gives you a description of the issue and an example how to fix it. You can also check for duplicate code with the copy and paste feature.
We are in the process of investigating enabling Crap4j and JDepend to measure complexity and dependencies.
These reports are very helpful to determine when coding is complete and ready for integration testing. For example: When the unit tests get to 70% code coverage, they all pass, there are no outsanding critical/high level bugs, and there are no open FIXME or TODO issues then you are ready to start integration testing.
If someone ever asked my how confident that what we wrote is right I'd go to the code coverage to the conditionals covered and say well I'm 30% sure.
The only report I didn't really find was lines of code which there probably is a plugin or option for. With lines of code report you can tell when the lines of code have hit their peek and the developers are refactoring code instead of actually developing code which indicates they will be ready for integration testing soon.
Some of the complexity warnings from PMD and FindBugs are helpful in detecting overly complex code which should be re-written so it will be easier to read and maintain but also less bugs and issues.

6) Traceability and Meta-Information

Every JAR built with the build server was given a fingerprint. If the JAR is used in multiple builds you can trace the JAR back to which build it came from. In addition we wrote out meta information with the build job name, build number in the MANIFEST.MF in the META-INF folder of EARs, JARs, and WARs. This way we could look at our environments and see where an artifact actually came from. We also utilized this a bit for our QA testers to report bugs they would write down this information so we could tell if it was an old bug that potentially needed retested or new bug.

7) History

You can keep a history of builds for however long you want. This was helpful if we wanted to restore old code to back out changes quickly. You can also see who introduced what changes and when.

8) Fun

We enabled the The Continuous Integration Game plugin which helps build good development habits of doing commits small sets of code changes and running unit tests before commits. Basically if you break the build or break unit tests you get points subtracted. If you have a successful build then you get points added. It keeps track of a leader board.

We also enabled the ChuckNorris Plugin which was loads of fun with funny pictures and programmer sayings when the build is broken or successful.


Drawbacks:
1) Setup of projects.

It was nice for new applications we just used a generator to generate it according to standards. However legacy applications were all built in a non-standard way. We had to either add options to our build scripts to override the defaults or change the legacy applications.

Once we got an example of each particular application type on the build server it was not really any effort at all to add another. You just copy an existing job and change the parameters.

2) Abandoned unit tests

Some of our older applications had either no unit tests or were in such bad shape that most didn't pass anymore. We had to set a standard for legacy or maintenance applications verses newly developed applications.

3) Clearcase integration

Clearcase did not seem to integrate well with Hudson as views would randomly get lost from the repository. While this was an occasional hiccup it certainly was not enough to prevent us from continuing. Not sure if we ever figured out the issue but we had the notion that ClearCase command prompt api was never tested with any concurrency of commands.
We later moved to using Subversion with Hudson and it works wonderfully.

4) Ant

While Ant is a wonderful tool I believe we could have managed build inter-dependencies better if we had used Maven with an artifact repository. A lot of times we manually copied jars from the build server and checked them into the consuming application's code. It would be nice if we were using Maven that if we changed a Java Project the build would create a new JAR and publish it to the maven repository. Which in turn would be detected and kick off builds for all the applications that used that JAR.

Saturday, August 21, 2010

Collection Initialization

Early in my career, fresh out of college, I inherited the code for one of our tools that ran on a nightly timer. I was in charge of enhancing part of the application to add some functionality.

Fresh out of college I remembered that initializing hash maps size to just over the desired size leads to optimal hashing. Basically without proper sizing everything is hashed to the same node in a linked list which defeats the purpose of a hash map.

This tool used to take around 8 hours which was scheduled to run at night.
So you might think the following:
1) So what if it is more efficient
2) It re-sizes itself so why bother doing it myself

After I initialized the the hash map to the appropriate size the tool went from running 8 hours to running in less than an hour. That's a 7 hour gain.
After this I realized that:
1) Collections take time to dynamically re-size.
2) Collections will constantly be resizing as they run out of room.

Now that the tool took less than an hour it could be run a couple times during the day ad-hoc. When changes were needed to the tool you could verify the results in a manageable time period. When the job failed for whatever reason the whole task could be restarted and do two days of work in 2 hours instead of 16 hours.

Suggestions as to when you should initialize collections:
1) If you are going to put more than 100 objects in it.
2) If you are a cautious/defensive developer like me.

I'm not as worried about getting the optimal size as long as you are relatively close.