Wednesday, August 25, 2010

Continuous Integration

We used continuous integration in my department for a year and wanted to share some feedback. First just to summarize I absolutely loved it. I tend to think about the time before we were using it as the dark ages and now we are in the renaissance of development. We used Hudson for our continuous integration server. We used the version prior to the fork with Jenkins and I'm not sure which I'd opt for now. I'd probably go with Jenkins if the community moved there. I may also refer to it as our build server although it really does a lot more than building.

Benefits:
1) Automatic feedback when somebody breaks the build.

This is really important to get that immediate feedback so somebody can go in and fix things right away. No more waiting for days and then discovering somebody made a change and everything is broken. This was immensely helpful with our developers in India as they regularly forget to check in files. If they don't leave things in a stable state they will get an email that the build is broken.

2) Automatic running of unit tests.

Some developers are better at writing unit tests than others. Some don't bother to run them after they make changes. Then when the application went from development to maintenance who knows if the tests were ran or updated.
With a continuous integration the tests get ran automatically every time a build runs so you get an email if you've introduced something that goes against the set behavior outlined in the unit tests.

3) Standardized project configuration and setup.

Every application we created was basically up to the person creating them as to the folder structure and conventions used. Once you start putting things on the build server you see the need for a standardized folders and conventions. User libraries get standardized. This helped us in the fact that once you have a job setup for a project you have standard setup for an application to refer to. Less issues with it working on one persons machine and then not another persons machine.

4) Automatic deployments

We had the capability to automatically deploy the EAR files directly to our developer and staging environments. This is a huge time saver and reduces errors. There was always people exporting EAR files from our IDE, deploying them, and they wouldn't work. Just repeated exporting and deploying can eat up time. It's just nice that if the build is successful and passes unit tests that it is auto deployed. This also helped with people working remotely from home as it was slow to deploy when connected to VPN.


5) Metrics

The reports you get out of Hudson with plugins are great. Unit tests pass/fail, code coverage, bugs.
We utilized FindBugs, PMD, Task scanner, and Checkstyle plugins. We also created a couple custom ones to suit our specific needs.
The FindBugs and PMD bugs are great in that you can find common coding mistakes and critical errors like thread safety problems automatically. They let you navigate to the line of code with issues and then gives you a description of the issue and an example how to fix it. You can also check for duplicate code with the copy and paste feature.
We are in the process of investigating enabling Crap4j and JDepend to measure complexity and dependencies.
These reports are very helpful to determine when coding is complete and ready for integration testing. For example: When the unit tests get to 70% code coverage, they all pass, there are no outsanding critical/high level bugs, and there are no open FIXME or TODO issues then you are ready to start integration testing.
If someone ever asked my how confident that what we wrote is right I'd go to the code coverage to the conditionals covered and say well I'm 30% sure.
The only report I didn't really find was lines of code which there probably is a plugin or option for. With lines of code report you can tell when the lines of code have hit their peek and the developers are refactoring code instead of actually developing code which indicates they will be ready for integration testing soon.
Some of the complexity warnings from PMD and FindBugs are helpful in detecting overly complex code which should be re-written so it will be easier to read and maintain but also less bugs and issues.

6) Traceability and Meta-Information

Every JAR built with the build server was given a fingerprint. If the JAR is used in multiple builds you can trace the JAR back to which build it came from. In addition we wrote out meta information with the build job name, build number in the MANIFEST.MF in the META-INF folder of EARs, JARs, and WARs. This way we could look at our environments and see where an artifact actually came from. We also utilized this a bit for our QA testers to report bugs they would write down this information so we could tell if it was an old bug that potentially needed retested or new bug.

7) History

You can keep a history of builds for however long you want. This was helpful if we wanted to restore old code to back out changes quickly. You can also see who introduced what changes and when.

8) Fun

We enabled the The Continuous Integration Game plugin which helps build good development habits of doing commits small sets of code changes and running unit tests before commits. Basically if you break the build or break unit tests you get points subtracted. If you have a successful build then you get points added. It keeps track of a leader board.

We also enabled the ChuckNorris Plugin which was loads of fun with funny pictures and programmer sayings when the build is broken or successful.


Drawbacks:
1) Setup of projects.

It was nice for new applications we just used a generator to generate it according to standards. However legacy applications were all built in a non-standard way. We had to either add options to our build scripts to override the defaults or change the legacy applications.

Once we got an example of each particular application type on the build server it was not really any effort at all to add another. You just copy an existing job and change the parameters.

2) Abandoned unit tests

Some of our older applications had either no unit tests or were in such bad shape that most didn't pass anymore. We had to set a standard for legacy or maintenance applications verses newly developed applications.

3) Clearcase integration

Clearcase did not seem to integrate well with Hudson as views would randomly get lost from the repository. While this was an occasional hiccup it certainly was not enough to prevent us from continuing. Not sure if we ever figured out the issue but we had the notion that ClearCase command prompt api was never tested with any concurrency of commands.
We later moved to using Subversion with Hudson and it works wonderfully.

4) Ant

While Ant is a wonderful tool I believe we could have managed build inter-dependencies better if we had used Maven with an artifact repository. A lot of times we manually copied jars from the build server and checked them into the consuming application's code. It would be nice if we were using Maven that if we changed a Java Project the build would create a new JAR and publish it to the maven repository. Which in turn would be detected and kick off builds for all the applications that used that JAR.

Saturday, August 21, 2010

Collection Initialization

Early in my career, fresh out of college, I inherited the code for one of our tools that ran on a nightly timer. I was in charge of enhancing part of the application to add some functionality.

Fresh out of college I remembered that initializing hash maps size to just over the desired size leads to optimal hashing. Basically without proper sizing everything is hashed to the same node in a linked list which defeats the purpose of a hash map.

This tool used to take around 8 hours which was scheduled to run at night.
So you might think the following:
1) So what if it is more efficient
2) It re-sizes itself so why bother doing it myself

After I initialized the the hash map to the appropriate size the tool went from running 8 hours to running in less than an hour. That's a 7 hour gain.
After this I realized that:
1) Collections take time to dynamically re-size.
2) Collections will constantly be resizing as they run out of room.

Now that the tool took less than an hour it could be run a couple times during the day ad-hoc. When changes were needed to the tool you could verify the results in a manageable time period. When the job failed for whatever reason the whole task could be restarted and do two days of work in 2 hours instead of 16 hours.

Suggestions as to when you should initialize collections:
1) If you are going to put more than 100 objects in it.
2) If you are a cautious/defensive developer like me.

I'm not as worried about getting the optimal size as long as you are relatively close.

Saturday, July 3, 2010

OSGi and JEE. Is it possible?

I am curious about others experiences OSGi with JEE? OSGi seems like a solution to dependency collisions, allows for versioned modules, and hot deployment. I've seen several different variations on using OSGi with JEE. It seems application servers are starting to pick up on it which is a great thing. However I was wondering if anyone has used it for local native services? I see people use it for Servlets and serving simple JSPs. I was hoping to use it for local native Java services that allow exposing interfaces in Java like web services do but don't bring any dependencies into the picture like normal POJO Java Services do. Unlike web services OSGi services should not have the overhead of transferring things over the wire. I've saw forum postings where people have troubles using JEE resources that are exposed through JNDI. Is this really an issue that is of concern? Also if people have any tools or tips for working with OSGi bundles it would be much appreciated.

Saturday, June 19, 2010

Thread Safe Code and Stress Testing

It just amazes me how much code makes it past code reviews that is not thread safe but should be.
Developers often know that certain code has troubles if it is not thread safe and what thread safe means. However they are not able to effective discern what needs to be thread safe and what does not. I had an individual that was very aware of these issues help mentor me as to how to detect these types of problems. I had seen problems and fix them prior to his mentoring but he really helped me to be proactive and see threading issues when authoring code.

The biggest issue is lack of understanding that beans defined in Spring are singletons. Also even if you inject a prototype into a singleton it effectively makes it a singleton. Then there are the problems where they improperly implement the singleton pattern, randomly synchronize things that don't need to, or create static variables that are not thread safe. I've seen so many times where developers create a static SimpleDateFormat instance. Recently I saw a DAO with local variables that were not thread safe. I also recently saw a persistence layer where they tried to manually implement the singleton pattern but didn't do it correctly so the local variables that were thread safe were not initialized correctly leading to NullPointerExceptions.

I tend to see threading issues revealed when load testing is performed. However to my amazement load testing is not always performed. I've also seen some some threading issues that have snuck past load testing and have never presented themselves as a problem.

We use a continuous integration build server but it only lists thread safety problems as warnings and then only catches half of them because it doesn't know what should or shouldn't be a singleton and thread safe any more than the developers do.

I've recently initialized an internal developer training effort.
There are relatively few developers that can detect these errors and even fewer that can fix them. So it is really an issue as these developers can not code review every line of code.

Are there any ways to catch these kind of problems earlier or some good resources to aid in developer training? Do people have similar experiences?

It seems most developers don't care that code has any more quality than it appears to work correctly. Am I being a little anal retentive and should just go with the flow, realize that code will not be perfect, and just let these problems exist and get resolved later?

Tuesday, June 15, 2010

First Groovy Script

A couple days ago I wrote my first Groovy script. Of course it had to be loaded into the Spring IoC container to wire it up to the beans using it.
Having a Java background and looking at Groovy's website it was very hard to define a pretty simple class with a method that implements a java interface.
First it made me use the def keyword with the method definition whereas the documentation didn't even have an example of how to write a method.
Then it made me define the return type of the method because it was saying it didn't match the interface return wheras the documentation seemed to indicate these did not have to be statically defined/typed.
The method took in no parameters and returned a List and was returning [instanceofMyObject1, instanceofMyObject2]
The documention showed defining typeless variables without using the def keyword.
It kept looking for member variables in the class so I had to define each one with the keyword def to get it to work.

Is the Groovy documentation really lacking good basic examples like writing a class with a method and using local variables?

The documentation outlined some of the power features but did very little to help people learn the basics other than go over some command line Groovy examples.

Can people recommend some good resources for a Java developer to learn and effectively take full advantage of the capabilities of Groovy?