Locale-Independent String Sorting in Java

In Java programming language, Arrays.sort or Collections.sort methods from the Class Library are used to sort a list of strings. These methods use the compareTo method of the Comparable interface for objects. The standard compareTo implementation of the String class uses ASCII codes of the characters, and this works great in English.

When you need to sort strings in languages other than English, or more commonly you need to write a software for global use, then you can use the Collator class.

Below is the Turkish alphabet in order.

abcçdefgğhıijklmnoöprsştuüvyz

Let's sort the characters of the alphabet with Arrays.sort method.

String[] alphabet = "abcçdefgğhıijklmnoöprsştuüvyz".split("");

Arrays.sort(alphabet);

The result is not ordered in Turkish, as expected.

abcdefghijklmnoprstuvyzçöüğış

Let's try the Collator class.

Arrays.sort(alphabet, Collator.getInstance(Locale.forLanguageTag("tr")));

This works just in Turkish unsurprisingly. When you need to make your code work in other languages too, you can use getDefault instead of forLanguageTag.

Arrays.sort(alphabet, Collator.getInstance(Locale.getDefault()));

I will write about internationalization in detail, but until then, have a look at What's wrong with Turkey? by Jeff Atwood

Null Object Pattern

A couple of months ago I needed to write a logging mechanism where logs weren't always necessary to be persisted and log objects had to be transferred between methods, so each method assigns some specific attributes of the log object before being persisted. I know the latter seems like a terrible design, but not everybody can work on a state-of-the-art system like some lucky guys do!

You can get a feeling of the log class below:

public class AuditLog {
    private String id;
    ...
    @Override
    public String getId() {
        return id;
    }
    @Override
    public void setId(String id) {
        this.id = id;
    }
    ...
    @Override
    public void save() {
        //Persist to database
    }
}

Standard procedure is of course passing null reference to the methods when there is no need to log and afterwards checking if the object is not null, so you can invoke its methods without getting a null reference exception.

public class TransactionHandler {
    public void handle(Transaction transaction, AuditLog log) {
        if (log != null)
            log.setId(id);
        ...
    }
    ...
    public void log(AuditLog log) {
        if (log != null)
            log.save();
    }
}

If you are an Objective-C programmer, then you are lucky, you don't need to worry about checking if the object is null. But if you are using a mainstream object oriented programming language like me, you need to keep in mind that the objects can be null.

When you need to check null reference multiple times in more than a few methods, then your code is going to start smelling. You should start thinking about something else. Null Object Pattern can be your way out.

Null Object Pattern

Null Object Pattern is actually a special form of Special Case pattern. It gives a null meaning to your object and prevents getting null reference error and the best part is: invoking its methods does nothing and there's no need to check if it is null. It will cost you a method call instead of an if statement, but you will get a cleaner code.

Let's get back to the design. We can transform the AuditLog class to use the Null Object pattern like below.

null object pattern

The code will simply be changed to:

public interface AuditLog {
    public String getId();
    public void setId(String id);
    ...
    public void save();
}

public class AuditLogImpl implements AuditLog {
    private String id;
    ...
    @Override
    public String getId() {
        return id;
    }
    @Override
    public void setId(String id) {
        this.id = id;
    }
    ...
    @Override
    public void save() {
        //Persist to database
    }
}

public class NullAuditLog implements AuditLog {
    @Override
    public String getId() {
        return null;
    }
    @Override
    public void setId(String id) {
        //Do nothing
    }
    ...
    @Override
    public void save() {
        //Do nothing
    }
}

Invoking the save method with a NullAuditLog instance will do nothing and you don't need to add a null reference check anymore.

public class TransactionHandler {
    public void handle(Transaction transaction, AuditLog log) {
        log.setId(id);
        ...
    }
    ...
    public void log(AuditLog log) {
        log.save();
    }
}

Actually using an interface is the optimal usage. But in my design, I simply extended AuditLog class as NullAuditLog and made it a private static class inside of the AuditLog class. With this way null instance may have been retrieved from the NULL constant defined in AuditLog class. You can have a look at it on github.

There are some downsides of this usage, like when you add new methods to the class, you should remember to add them into your null class. With an interface it is clearer to add method definitions into your interface and implement them in the derived classes.

Conclusion

Even if it came out because of a billion dollar mistake, Null Object Pattern is very handy and makes your code clean. This doesn't mean that you should apply Null Object Pattern to every single class in your design - it can make things even more complicated and error-prone. It may be used when:

  • It is possible to have a null reference and assigning null reference has a meaning in the design. For example we used it to not persist a log into the database
  • There are many null comparisons for the object across the system

Take a look at the Special Case pattern if you didn't before. I'm sure you will find it very useful.

Source Code for MS-DOS and Word for Windows

Microsoft released source code for MS-DOS and Word for Windows this week with the Computer History Museum.

On Tuesday, we dusted off the source code for early versions of MS-DOS and Word for Windows. With the help of the Computer History Museum, we are making this code available to the public for the first time.

Without reading the whole press release, you might think of it as Microsoft's new move towards getting into the open source community. Actually the intent is good, but it's not making them open source:

Thanks to the Computer History Museum, these important pieces of source code will be preserved and made available to the community for historical and technical scholarship.

As I said the intent is good, but this effort might have been more meaningful maybe 15 years ago. People could have learned many things from their work and they could have used them in their fields. Who knows, maybe we would have had better word processors by now.

Of course, this is about their marketing strategy, and I don't blame them for this. I don't really think they are evil. They have some good efforts. Besides Microsoft didn't kill my pappy, not yet.

You can grab source codes of MS-DOS and Word for Windows from the Computer History Museum's website.

The Good Enough Software Design

Designing software is a challenge, especially for a perfectionist programmer. There are tons of decisions to make and lots of problems with many solutions. It is very likely that some of them will be hard. You can find yourself searching best practices online, and asking questions at stackoverflow.com. Even if you find practices and you get the answers you want, it is quite possible that you will end up with a design that you are not satisfied with. You doubt whether it is a perfect design or not. Guess what, there is no perfect design!

Mount of Design

It is impossible to solve the design puzzle as a whole. Every requirement will pop up in your brain and you will see the biggest challenges in the design like the choices about the data or the presentation layer. Do I need to use MVC or MVVM? Which ORM tool do I need to use? How am I supposed to handle multiple database connections? Every question will make you feel more overwhelmed and you will start to see the design as an ever growing mountain. Every step you take will seem worthless to the giant mountain in front of you. Feeling overwhelmed can make you want to escape from the problem and the procrastination problem begins. But procrastination won't lead you to the solution.

The simplest solution to this problem is to Divide & Conquer. You have to stop thinking about the entire system in your head. You should divide the design into smaller blocks and see what you can do to those simple blocks one by one. If possible, choose an iterative development approach like XP. This way you can focus design obstacles iteratively and then you will start to see multiple small hills in your head.

Also don't spend too much time on choosing tools or APIs to use like a database or a rule engine. These are just tools to make our lives easier. You should spend more time on general design principles like Separation of Concerns. For example SoC will allow you to change any layer or any tool in your software without affecting the others. Apply TDD, it will pump up your courage and let you make radical changes in your software.

Don't forget; the time spent on design is never enough. As Steve McConnell said in his masterpiece, Code Complete,

When are you done [designing]? Since design is open-ended, the most common answer to that question is “When you're out of time."

Requirements Change

Your client or your company has to adapt changes in the business to survive. This means that requirements will change. You can't write a software to cover all the possible requirements that could come up in the future. There will always be some requirements that didn't even pop up in your brain. So don't try to cover future requirements because You Aren't Gonna Need It. It is like reserving and decorating a room for a baby when you aren't even expecting one. Try to write a software that covers the current requirements.

Don't Overengineer

If you work on a specific feature for days that will reduce the work of your client just 5 minutes in a year, then your work is actually useless. I know sometimes we can't stop ourselves making that particular change, because it might feel us like a better programmer or we might do it just for fun, but you need to consider the cost/benefit ratio. Your time is precious, spend your time on something better.

Overengineering problem

Simplify

When you just need to store the state of an object into the file system, no time-intensive job or anything special, it is pointless to write a custom object serializer. I saw some designs that look like masterpieces from an engineering point of view, but they made everything overcomplicated. It takes weeks for newcomers to adapt to the system. You shouldn't make your design overcomplicated. Sometimes the best solution is the simplest one. So keep in mind the KISS principle.

Courage

The design actually is the outcome of the programmer's experience. It is quite likely that different programmers can come up with different design ideas. This might result in you searching for the best possible way to design a feature since there is no single solution to a design problem. Don't do that. It doesn't mean you should stop learning, listening or reading other people's ideas, of course you should. But design phase is not the time to learn all the best practices in the world that could possibly be applied to your problem. Have some courage, do your best design and start implementing your design ideas.

In the design, you might give a class more responsibility than it should have, or you might not see the inheritance relationship between two classes, or you might miss a clearly visible pattern. When you start coding, you will see some similar flaws in your design. Refactoring is a really good practice to make your design better. Don't try to make a perfect design, start coding and refactor when your code smells. You will learn from your mistakes and they will be helpful for your future designs.

No Perfect Sofware

Whatever you do you will end up with a piece of software that won't satisfy you. As Hunt and Thomas wrote in The Pragmatic Programmer;

... perfect software doesn't exist. No one in the brief history of computing has ever written a piece of perfect software. It's unlikely that you'll be the first. And unless you accept this as a fact, you'll end up wasting time and energy chasing an impossible dream.

I'm sorry, but they are right. Instead of chasing an impossible dream, try to make your design "good enough" and start coding!

Developing Hadoop Projects in Eclipse

I’ve started learning the Apache Hadoop platform recently with a great book; Hadoop: The Definitive Guide. I had written some examples in terminal, but the projects were getting bigger and bigger. Eventually, I decided to switch to an IDE and started using Eclipse. Below I will explain two different ways to start your Hadoop Eclipse project in standalone mode.

Importing Libraries

Create a new project and add these libraries as a reference to your project.

  • commons-configuration-1.6.jar
  • commons-httpclient-3.0.1.jar
  • commons-lang-2.4.jar
  • commons-logging-1.1.1.jar
  • hadoop-core-1.0.0.jar
  • jackson-core-asl-1.0.1.jar
  • jackson-mapper-asl-1.0.1.jar

Add your source folder and you are ready to start your MapReduce application!

Hadoop project

Using Maven

First you need to create a pom.xml file. You can get it from my repository on github.

After getting the pom.xml, you can create your Eclipse project with the following command.

mvn eclipse:eclipse

Alternatively you can start a Java project in Eclipse, add pom.xml file to your project, right click your pom.xml and select Run as > Maven Build. Then you should write eclipse:eclipse to the Goals section and hit Run.

Hadoop maven build

After the build is completed, right click on your project and Refresh your project. You should see the added libraries in the Referenced Libraries section.

Hadoop project maven

Now, you can run your application.