IBM Skip to main content
Search for:   within 
      Search help  
     IBM home  |  Products & services  |  Support & downloads   |  My account

developerWorks > Java technology
developerWorks
Diagnosing Java code: Design for easy code maintenance
Discuss67KBe-mail it!
Contents:
Mutability and decipherability
Private methods
Final methods, final classes, and understanding code locally
Unit tests and mutation
Refactoring tools
Resources
About the author
Rate this article
Related content:
developerWorks code reuse theme
Diagnosing Java code: Designing "testable" applications
Diagnosing Java code series
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
Avoid unnecessary mutation and access to make code robust and easier to maintain

Level: Introductory

Eric E. Allen (mailto:eallen@cs.rice.edu?cc=&subject=Design for easy code maintenance)
Ph.D. candidate, Java programming languages team, Rice University
1 January 2003

Column iconThis month, Eric Allen explains how avoiding and controlling gratuitous mutation is key to retaining code robustness while making the code easier to maintain. He focuses on such concepts as functional style code crafting and ways of marking fields, methods, and classes to handle and prevent mutability. Also, Eric explains the role of unit testing and refactoring in this task, and offers two tools to aid in refactoring efforts. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)

Effective debugging begins with good programming. Designing a program to be easy to maintain is one of the most difficult challenges a programmer faces, in part because programs are often maintained by programmers other than those who originated the code. To maintain such programs effectively, new programmers have to be able to quickly learn how the program works, a task that's done most easily if small parts of the program can be understood in isolation from the whole.

We'll outline some of the ways that programs can be written to help make them more easily understood and maintained, looking at the issues of mutability, decipherability, private methods, final methods, final classes, local code, unit tests, and refactoring.

Mutability and decipherability
First up is the issue of mutability. Parts of a program are most easily understood in isolation if the data that each part works on is not altered by other, remote parts of a program during a computation.

Too much information
For example, consider a program using an instance of a container class where the constituent links can be modified. Every time the container is passed to a method from one part of a program to another, and every time a new expression is called where the container is passed as an argument, an opportunity exists for the container to be altered away from the control of the calling method.

We can't really be sure of our understanding of the calling method, much less our ability to diagnose a bug, until we first understand how each of the methods it calls modifies the container. If each of these called methods call other modifying methods in turn, the amount of code the maintenance programmer must read to understand a single method can quickly balloon out of control.

For this reason, it can be highly advantageous to use different classes for mutable and immutable containers. In the immutable versions, the fields of the container can be marked as final.

Functional style to the rescue
Writing code so that it constructs new data as opposed to modifying old data is known as functional style because the methods of the program act like mathematical functions whose behavior is described solely in terms of the output returned for each input.

The often overlooked advantage of functional style is that the individual components of the program are far more easily understood in isolation. If the data manipulated by a method is never altered by any of the operations performed in its body, then all a programmer has to do to understand that method is understand the results returned by those operations. Compare this to the scenario above in which a method calls several other methods, each of which modify the very data structures the method operates on.

One nice feature of the Java language is that it allows us to use the final keyword, as a directive to the type checker, to declare when we want certain data to be immutable.

Avoiding mutation with the final keyword is a good way to nail down the behavior of a class's methods. Every time a field is modified, it has the potential to alter the behavior of the methods that refer to it. Additionally, marking a field as final lets other programmers that read the program know instantly that the field is never modified, no matter how large the whole program is. For example, consider the class hierarchy in the following for representing immutable lists.


abstract class List {...}

class Empty extends List {...}

class Cons extends List {
  private final Object first;
  private final List rest;
}

All fields in these classes are marked as final. Is that enough to ensure that instances of these classes are immutable? Not quite. Of course, even when a field is marked as final, it's important to remember that the components of the field itself may not be final. Any part of the program that refers to those components may be modified when they are altered, regardless of whether the field itself is altered. In the example above, although the constituent elements of the list can't be modified, we have to check that those elements themselves don't contain non-final fields that may be modified.

In this case, although a list may contain mutable elements, we can see that the sequence of elements stored in a given list are immutable by reasoning as follows: instances of Empty lists (that is, lists of length zero) contain no elements at all; therefore, they can't be modified. Instances of Cons (non-empty lists) contain two fields, both final. The first field contains the first element of the list and can't be modified; the second contains a list containing all remaining elements. If the contents of this list is immutable, then so is the containing list.

But the list contained in this second field has a length one less than the length of containing list, so if we knew that all lists of length n were immutable, we'd know that lists of length n + 1 were also immutable. Since we already know that zero-length lists are immutable, we also know that lists of length 1, 2, 3, and so on are also immutable.

Tracing through the connections of a data structure like this can be tedious, but it pays off when you can determine global properties of a such a structure, such as immutability.

Controlling mutation
The best strategy to defend against unexpected mutation is to simply avoid all mutation whenever possible. Only when there is a compelling reason to mutate (such as, when it vastly simplifies the structure of the code) should we make use of it. When mutation can be avoided, the payoff can be enormous (in terms of lower maintenance costs and increased robustness).

Even when there is a compelling reason to mutate data, it's best to try to control that mutation, to limit the potential damage as much as possible. Iterators and Streams are great examples of data structures explicitly designed to control mutation by allowing us to walk over a series of elements in a regular and well-defined fashion, rather than explicitly modifying some handle on the elements.

Private methods
Just as setting fields to final helps limit outside influences on the value of a field, setting them to private helps to limit the influence they have on other parts of the program. If a field is private, we can be certain that no other parts of the program depend on it directly. If we eliminate the field and replace the internal representation of the class data, we need only worry about fixing the methods inside of the class to access the new data properly.

In the earlier example, notice that the fields of class Cons are private. That way, we can control how those elements are accessed through getters and the like. If a future maintainer of our Lists ever wanted to modify our internal representation of Lists (for example, perhaps it turns out that on certain platforms, array-based lists are more efficient), the programmer can do so without modifying or even looking at any of the clients of those lists. He simply has to rewrite the getters to take the appropriate action with the new data.

Final methods, final classes, and understanding code locally
In contrast to marking fields as final, marking a method as final is often claimed to be at odds with OO design goals because it inhibits inheritance polymorphism. But when trying to understand the behavior of a large program, it helps to know what methods are not overridden.

Now it's absolutely true that good OO design involves using a great deal of inheritance. In fact, inheritance is central to many OO design patterns. But that doesn't mean that we should allow every method we write to be overwritten. Often a program will implicitly rely on certain key methods not being overwritten. By marking such methods as final, we will allow other programmers to better understand the behavior of expressions that call the method.

Additionally, marking classes as final can be a great boost to decipherability. It can really help to know at a glance which classes are never subclassed in a program. In fact, I would argue that the only classes that shouldn't be marked as final are classes that are actually subclassed in a program and classes that, as an inherent part of the program design, are intended to be subclassed from outside components.

Some may say that this concept will straightjacket future maintainers of the code, keeping them from being able to extend the code. I say it will certainly not restrict them. If future maintainers of a program need to extend it to include a subclass where none existed before, it's not hard to delete the final keyword on the corresponding class and recompile it, provided they have access to the source code (and if they don't have access to it, in what sense are they "maintainers" of that code?).

Meanwhile, that added keyword serves as a form of automatically verified documentation of an important invariant about the program ("automatically verified" because the program won't even compile if the documentation is violated). By forcing developers to consciously choose when they want to eliminate such an invariant, we can help to reduce the introduction of errors.

Unit tests and mutation
As always, unit tests can help in understanding side-effecting code. If a suite of unit tests adequately documents the effects of the methods in a program, then a programmer can understand each method more quickly just by reading its unit tests. Of course, the big question is whether the unit tests really do cover the effects adequately. Coverage analysis tools like Clover can help here to some degree.

Notice, however, that unit tests themselves are much easier to write for strictly functional methods. To test strictly functional methods, all that's involved is to call these methods with various representative inputs and check their outputs (and make sure they throw exceptions when they should).

When testing methods that modify the state of data structures, one must first perform the operations necessary to put the input data into the state expected by the method and then, after calling the method, check that every modification of the data expected by clients was performed correctly.

Wrapping up with refactoring tools
These tips can be great when writing new code, but what about when you have to maintain old code that is barely decipherable? Refactor, refactor, refactor.

Although refactoring old code takes time, that time is well spent, especially with all the tool support for refactoring now available for Java code. There are now many powerful tools for automatically refactoring Java code, tools that preserve key invariants automatically.

One of the most full-featured tools for refactoring Java code is the IDEA development environment. This environment provides automatic support for a significant chunk of Martin Fowler's refactoring patterns. Another tool I have found to be very useful is CodeGuide, a German IDE. Although its list of automatic refactorings is small compared to IDEA, it showcases an extraordinarily powerful feature -- continuous compilation. While you're typing new code, CodeGuide analyzes it and tells you if anything in the project has broken (of course, there is a short delay to prevent it from signaling errors on every keystroke).

Although continuous compilation negatively affects responsiveness, it can be well worth the wait in certain contexts. For example, you can type final in front of a field and instantly see if anything in the project breaks. If not, you know that the field isn't modified anywhere in the program. Similarly, you can type private in front of a field and instantly get a list of all outside accesses to the field (in the form of errors).

Another great feature of CodeGuide is that it provides seamless support for the JSR-14 experimental extension with generic types (scheduled for official addition in Java 1.5).

Although writing code for decipherability can take a lot more time and effort, it can help to increase the lifetime and the robustness of your code, and it can significantly enhance the quality of life for those who face the task of maintaining it. Finally, refactoring old code to be more maintainable takes time but pays for itself the next time you have to fix a bug.

Resources

About the author
Eric Allen possesses a broad range of hands-on knowledge of technology and the computer industry. With a B.S. in computer science and mathematics from Cornell University and an M.S. in computer science from Rice University, Eric is currently a Ph.D. candidate in the Java programming languages team at Rice. Eric's research concerns the development of semantic models and static analysis tools for the Java language at the source and bytecode levels. He is also concerned with the verification of security protocols through semantic formalisms and type checking.
Eric is a project manager for and a founding member of the DrJava project, an open-source Java IDE designed for beginners; he is also the lead developer of the university's experimental compiler for the NextGen programming language, an extension of the Java language with added experimental features. Eric has moderated several Java forums for the online magazine JavaWorld. In addition to these activities, Eric teaches software engineering to Rice University's computer science undergraduates. You can contact Eric at eallen@cs.rice.edu.


Discuss67KBe-mail it!

What do you think of this document?
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)

Send us your comments or click Discuss to share your comments with others.



developerWorks > Java technology
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact