Notes on Java

Introduction

I resisted learning Java for quite some time for multiple reasons. Most of the code I write is in C because I do a lot of low-level networking code and support libraries that I want to be fast, and I'm used to both the speed and the ability to easily generate native executables. I've supported Java code as a systems administrator and have always been unimpressed by the awkwardness and frustrations of its build system and system integration. For a long time, it wasn't free software. And other languages seemed a more interesting place to invest my time.

In March of 2010, a combination of factors changed my mind. First, several people in the excellent book Coders at Work clearly love the language and think that it's going interesting places for concurrency. Second, my group at work is picking up several in-house applications written entirely in Java, and I want to understand them well enough to maintain them in that language if need be. It's also used by Shibboleth and some other interesting areas of web authentication, which is one of my specialities. Finally, it comes up enough in various contexts that I feel like I should have a basic fluency.

I'm starting with Thinking in Java by Bruce Eckel as one of the better of the various Java introductory books and will expand my knowledge from there. I'm planning on learning the Eclipse editor at the same time (I may as well get the whole native Java development experience). These are my notes on the process.

Non-religious comments are welcome. Please don't send me advocacy.

2010-03-25

The first day in, I covered the basic syntax, wrote a few "hello, world" programs, and became familiar with the landscape. I already know the general principles of object-oriented programming and I'm an extremely experienced C programmer, so I was mostly looking for the places where Java is different and picking up on the early parts of the design philosophy.

The most amusing part of the first day to me were the things from C that Java didn't fix:

Java still apparently has the notorious parser ambiguity with else in nested if blocks. This isn't a serious flaw, but I'm surprised the language designers didn't fix it while they had a chance. Even Perl fixed this wart by requiring braces around blocks. Although I suppose Perl relieves the pressure on compact expression by adding the statement-modifier if.
Even more surprising, Java still has fall-through in switch statements, one of the nastiest sources of syntactic C bugs. There are ways to associate a code block with multiple cases without using fall-through, and I'm a bit amazed that Java didn't take one of them.
The special syntax for for (;;) is just a hack, clearly designed to try to not break the brains of C++ programmers. It's apparently the only place in the language that you can use the comma operator, and the ability to declare variables in the first portion of the for statement only if they're all the same type is quite weird. It's fairly obvious from the start that one wants to avoid the for (;;) syntax whenever possible and use the iterator syntax instead, for the same reasons as in Perl.
Having a native boolean type that's used properly by the language (it's the return status from logical operators, for instance) is lovely, but it's ridiculous that if (a = b) is still lurking in the language to bite the programmer. The native boolean type means that it only lurks if both a and b are booleans, but it's still there. Just define = as not returning the value being assigned. It's really not going to hurt you to write those assignments as multiple statements.

Like everyone else, I think the distinction between objects and primitive types is an ugly wart, but it looks like "autoboxing" (automatic promotion to a real object in many situations) means that you can mostly not care about those special cases. Although I see that data type sizes, arithmetic conversions, and surprises in how conversions happen are still with us, something that languages like Perl and Python have gotten rid of.

None of these are problems for me, of course, since I know C, where you already have to put up with all of this. I'm just surprised they're still there when other modern programming languages, such as the scripting languages, have gotten rid of them.

The half-assed operator overloading that's only available for addition of strings is just odd. I can see how that syntactic sugar saves a lot of time, but I'm not sure if the language warts were really worth it. (For instance, having s + a + b and s + (a + b), where s is a string and a and b are integers, produce entirely different strings is very surprising.) I suspect I'll change my mind once I start writing any significant quantity of Java code, since strings are legitimately everywhere.

So far, so good. Everything seems quite straightforward. I'm a bit puzzled in some cases why some of the core utilities around manipulating numbers are only static methods and aren't also instance methods, but I suspect this will become clearer later on. Having done lots of object-oriented programming in Perl, I already know how nice having a garbage collector makes things compared to what one has to do in C++.

2010-03-31

Inheritance in Java predictably works about the same as I'd expect and the way that it works in most any object-oriented language, but the visibility restrictions have an interesting twist in the form of package visibility. That looks like a neat hack from my perspective as a C programmer, since it's very akin to what one does in C with static functions except more useful because Java has a useful package concept that provides a better division of responsibility than a single source file.

I dislike the naming structure that Java's package system forces on all of the files in a source base, but it's just an aesthetic dislike of deeply nested directory structures. I'll get over it. Having a consistent standard that everyone follows is useful.

If I'm going to learn Java, I may as well learn the whole environment as recommended by most Java programmers, so I also installed Eclipse and started playing with that. It's very impressive, I must admit. The things that you can do with real-time code parsing are quite nice. The auto-completion of code is odd, but I think I'll get used to it. I don't like that it makes it hard to see where the 80 column mark is, since I'd still like to retain code readability for people in a regular-sized xterm, but I suspect that's a configuration issue that I can work out.

I can see why people using Eclipse don't mind the forced file naming standard, since Eclipse hides the disk files from you almost entirely and shows you the Java perspective. That's a big advantage to a standardized file naming system.

Eclipse does take forever to start, but it's not really worse than OpenOffice and I cope with that. Unfortunately, its on-line help is non-functional in the current Debian package at the time of this writing (Debian Bug#576106), but the tutorial worked and showed me most of what I needed to know. It's huge, with large and intimidating nested menus, which means I'm having a bit of the Gimp problem of being overwhelmed with too many options when I try to use it. If I use it at some length, I'm sure I'll get used to the keyboard short cuts.

2010-04-14

I've now learned (at least at a basic familiarity level) what Java interfaces are, which is one of those language features that I'd heard about but didn't know anything specific about. It's both much simpler and much clearer than I expected. I'd thought that interfaces were about information hiding: sort of a more complete way of separating public and private interface. But they're more about multiple inheritance. Objects in Java can only inherit from one base class, but they can implement as many interfaces as they wish, and a lot of other Java objects rely only on the interface. So in essence interfaces become property sets that objects can fulfill, which allows them to be used in contexts that require that set of properties.

The more I think about this concept, the more I like it. The reading about abstract base classes, interfaces, and inner classes really impressed me with how much Java has thought hard about information hiding and abstraction and taken the object-oriented approach considerably farther than C++ or Perl. I'm starting to like this language quite a bit, and am itching to write larger projects in it.

Factories now make considerably more sense now that I understand interfaces. It was always obvious what they did, but I previously didn't understand why it was useful to structure the code that way. But if the factory method satisfies an interface, it gives you another level of abstraction and hiding of the object. You can pass in any factory satisfying that interface, and it in turn can create any object satisfying what might be a very light-weight interface, without the calling code needing to have any idea what type the objects are. Without a factory you can't hide the constructor, so you can't delegate creation of objects to code that doesn't know exactly what type of objects you want to create.

Next is the beginnings of the standard library, starting with container objects. I'm expecting learning the standard library to be the hardest part of learning Java well enough to be productive in it, not because each part in isolation is that complex, but because it's so huge. It's going to be akin to the vocabulary phase of learning a language.

2010-04-15

Java's collection classes are very impressive. I hear that there are some strangenesses around using them, and of course they don't have the syntactic suger that most scripting languages have, but the CS major in me really appreciates the ability to choose implementations for different needs. I can think of a lot of things I've written where having a hash with a defined key sort order would be quite useful. Eckel's dislike of the Stack class due to its apparently incorrect inheritance model seems a bit too strong, but then I can't think of a lot of cases where I've specifically wanted a stack, and it's probably easiest to just use LinkedList or Queue.

The basic exception functionality is mostly as expected from what I saw of other languages, although I was rather annoyed at Eckel's preaching of only using the exception name to convey information and not bothering with string contents. This confirms one of the things I hate as a system administrator about debugging Java problems: no one ever bothers to throw a meaningful exception with any actually useful information.

I see that Java fixed the C++ throws declaration problem, and I like the way it was fixed. Having a set of unchecked exceptions for things that can be thrown everywhere is a nice way out of the trap of needing to call library interfaces that add new exceptions. However, I think I also agree with Eckel's point that declaring and catching all exceptions is less useful in large code bases than it appears, since often you'd have to catch the exception at a point where you don't know what to do about it. For the sorts of applications I'm most likely to work on (primarily web applications), the main utility of catching exceptions is to do something at the top level of the application to catch all exceptions and send them somewhere useful while providing the user with a reasonable error message instead of a backtrace in code they don't have access to.

The real revelation of the exceptions chapter is the way Java uses finally clauses (which are run unconditionally upon exit from a try block), and specifically the way that they catch not only exceptional exit but also return from the middle of the block. This is quietly brilliant. It provides a common method for cleaning up any form of allocated resources no matter the exit path taken from a method, even if there are no exceptions involved. I can think of innumerable places where I would have loved to have this feature in C, even when error handling isn't involved. In many ways, it duplicates the standard C goto error handling method, in which both success and failure is routed (via goto when necessary) to a cleanup block at the end of the function.

Strings are predictably somewhat awkward, but not as awkward as one might fear and certainly less awkward than C. As always, the regular expression handling in any language other than Perl looks ridiculously tedious and annoying, but at least there are mechanisms for doing most of what you can do in Perl (just requiring about ten times as many lines of code). I see that Java has largely picked up Perl-compatible regular expressions, although the bizarre intersection syntax in character classes is mystifying and the union syntax seems to be completely unnecessary. But I was pleased to see that lookahead and lookbehind assertions are both supported. (Eckel is of course partly incorrect when he says that possessive qualifiers are Java-specific; that expression is Java-specific, but suppression of backtracking is supported in Perl via a different syntax that Java also supports. This is just a feature that you almost never actually want to use.)

I was amused by Eckel's footnoted rant about the supposedly incomprehensible naming of the lookingAt() method. He was surprisingly nasty, and also surprisingly ignorant; that method name is immediately familiar to anyone who's written an Emacs mode (Emacs Lisp has the same function), and is an intuitive name when one is writing a scanner using regular expressions.

2010-04-28

It took me two weeks to get through the chapters on type information and generics, which I reliably find among the most boring language topics. I suspect that one of the reasons why I'm not a functional language hacker is that I find discussions of type systems almost unbearably boring. It doesn't help that generics make them both boring and complicated, with a lot of the examples in the book exploring edge cases that are difficult to wrap one's brain around.

On a more practical level, I'm not sure why these topics were introduced in this much depth at this point in the book. Both run-time type discovery and the complex corner cases of generics seem like advanced topics, and it's odd to have them introduced before basic I/O. This may make more sense when I get to basic I/O — maybe, like with exceptions, the concepts are required for understanding — but I'm struggling with learning about complicated type system trickery in the absence of enough information to write a practical Java program that does something. So much of the type handling and generics discussion is too theoretical.

But enough about complaining about the presentation method. What about the language? The type handling was interesting, and while I think I'll need to write some code that needs those features before I really understand them, the methods of interrogating classes seem fairly clean and straightforward. Every language seems to use radically different mechanisms for doing this, but I think I like the Java approach of using a singleton object for each class and building the type reflection handling fairly deeply into the object mechanism.

A lot of the generics chapter is a discussion of the problems with Java's approach to generics compared to C++, namely that they were bolted on later and use erasure of the specialized type rather than keeping the type information the way that C++ does. I can see the flaws, but I also feel a bit of relief that the whole system, while not as complete, is somewhat simpler than C++'s template system. In general, being as complex as C++ isn't a feature for me. But I can see how some of the inconsistencies will be strange and difficult to remember until they suddenly bite you.

One of the things I like a lot about Eckel's presentation is that he tries to show one, as the title says, how to think in Java and make full use of its capabilities rather than just writing a familiar style of code in a different syntax. One of the things that's frustrating is that (in a laudible desire to avoid just reproducing the Javadoc for the standard library), he teaches this with a lot of toy examples, which makes it hard for me to grasp the practical impact. I need to tackle some real projects in order to understand how often I'm really going to want to use things like generics and type information. My guess is that they're going to be rare events, but I could be completely wrong. I've been holding off on writing real code until I/O is covered, since what I can do without I/O is fairly limited, but that's unfortunately very late in the book. It would have been nice if he'd covered I/O and (ideally for me) JDBC earlier in the book so that I could be applying this to something concrete.

2010-05-20

Vacation and then other work led me to skip a week and then make a combined update for two weeks of actual study. Since the last entry, I've gone through arrays, advanced collections, I/O, enums, and annotations.

It's a bit odd that there's so much material on arrays given that apparently you almost never want to use an array unless you have a significant speed concern and instead want to just use a collection. But okay. Most of the information on both arrays and on collections is about typing and all the various ways in which you have to be careful with typing and generics when using arrays and collections. Java has a very nice set of collections, and from a CS major perspective it's also nice to see the underlying data structure documented. I have a good intuitive feel for the varying performance characteristics of the different collection implementations because of that.

I found it rather startling that Java essentially asks you to implement your own hash functions for any classes you write, at least if you want them to be used as keys in hash tables (and since you have no idea what users will want to do with your objects, that seems like a good idea to support). There's even a discussion of a simple hash algorithm to apply to arbitrary data in the object. Given the level of abstraction provided by Java elsewhere, I'm startled that Java developers are expected to know this. I wonder how many just bail and use the hex address of the object as the hash code (the default implementation). I suspect the answer is "nearly all of them."

Java seems to have a good handle on sorting and searching and some nice ways of handling equality comparisons in a way that puts the code where it can be best maintained. This is more "oh, look, a real object-oriented programming language" fun, which I admit is a lot of the enjoyment I'm getting out of Java. It's been since college that I've used a language that has these facilities built into the language in a clean and straightforward way.

I/O is a mess. It seems like everyone at least realizes that I/O is a mess, but that of course doesn't change that basic fact. Java uses decorators extensively, which means that to perform even basic I/O operations you wrap objects inside of objects inside of objects in a particularly tedious way. It gives you full control over things like buffering, but it's one of the places where I prefer how C does it. Yes, if you need to mess with the buffering in C, it's a bit obscure, but since 99% of the time you don't want to, that puts the pain on the unusual case. In Java, you have to explicitly buffer every stream you create when you create it, which is going to get painfully tedious. And it doesn't help that there are four completely different I/O subsystems (the old byte-oriented mode, the new character-oriented mode, random-access files, and the really new nio classes that use a semi-bizarre buffer construction).

As a side note, Eckel's constant use of the term "Unicode" for Java's character handling is driving me nuts. That's like saying that your computer architecture is based on integers. Yes, that's probably true, but it's completely unhelpful without specifying what the native word size is. In this case, "Unicode" appears to mean UCS-2, which is a particularly dumb implementation of Unicode that's incapable of representing many of the Unicode characters. I wonder if that's actually the case or if Java actually uses UTF-16 (or, more likely, UTF-16BE). Eckel is completely unhelpful.

Java data streams are intriguing in that they have a native serialization format that is documented to let any Java program talk to any other Java program over the network, sending binary data, and be able to recover the types on the other side. However, for my purposes this is entirely unuseful unless the exact encoding that's used is documented, and here again Eckel says nothing. It looks like here, though, the JDK documentation does say exactly what it's doing.

nio uses basically the same buffer model that INN has been using for many years: you have a buffer of a particular allocated length and store two pointers inside that buffer, one that shows how much got put into it and one that you use to work your way through that data. This model works fairly well, and I'm happy since I'm already familiar with how to use it from using those buffers extensively inside INN. I suspect this gets fairly confusing for a lot of users, though. I wish Eckel had given a feel for how much code in the wild uses nio versus using the various stream or reader/writer APIs. I also wonder what the mmap API in nio does if the local platform doesn't support mmap, and whether it suffers from the various problems that mmap does on some platforms (readers not seeing writes until the writer flushes, readers not seeing the results of the write syscall, etc.).

The object persistance supported natively by Java is rather impressive. I'm not sure when I'd ever use it, but it looks like an exception piece of engineering that hides a lot of complexity under a simple interface. Property files I suspect I'll use much more, as they've already been a bane of my existence as a sysadmin (mostly because the ones I've dealt with have mixed secure and insecure data willy-nilly). Having a standard configuration system built into the language is an excellent choice; even if it isn't quite everything one might want, having one there by default and used by everything that can use it makes a huge difference.

Java enums are rather odd. There's way more of an object there than I expected, and being able to write methods specific to particular values in the enum is mind-bending and strikes me as horribly confusing. It is very nice, however, to see that not only can you use enums for protocol constants that require known values, you can even attach descriptions or other metadata to an enum. I can see lots of uses for this in simplifying presentation and error handling code.

Java annotations are a great idea. I was somewhat familiar with this already from decorators in Python, but it's nice to see the full documentation and various examples of how to use them. Eckel goes very in-depth here, showing how to write an annotation system that can generate database schema automatically from an object hierarchy, which is a truly lovely trick and exactly the kind of thing that annotations are good for. It was also interesting to see some of the introspection bits that Java supports used for that practical of an application.

Also included in this chapter is a basic unit testing framework built using annotations, which makes it clear how useful annotations are for supporting unit testing and natural co-location of test code and object code. I see that JUnit has evolved in a direction similar to Eckel's basic test framework and now uses annotations in a similar way, plus has considerably more annotations for other interesting features (such as marking a test as ignored). JUnit is one of the things that I'm looking forward to in writing Java.

Now that I have I/O, I think I'm ready to start writing something significant, and I'm getting a bit tired of just absorbing information without using it. I may take a break from book study next time and start working on code, probably the Java remctl implementation (which will require that I learn about network I/O, something Eckel doesn't cover anywhere).

Last spun 2022-02-06 from thread modified 2014-01-06