Software Sin #2: Writing Large Subroutines

If your functions and methods do more than one thing, start repenting now.

I had two friends in college who became infamous for a 350 line Java ActionListener method that ran their entire Software Engineering and Design final project. Ironic, huh?

It was hideous to look at: Java code stuffed to the brim with nested try-catch blocks. However, it mostly worked, and technically, you can write an efficient, fully-featured application in any code block. That doesn't mean it's a good idea.

We encapsulate objects and their data to keep a system separate and logical. Those properties are crucial for extendability and maintainability, especially since you're not usually the only person working on an application.

Plus, how often do you do go back to code that you've written and have completely forgotten about? Hopefully you're pleasantly surprised...

If you write a function or method that's longer than 15-20 lines and/or has more than 3 levels of nesting, you should take some time to rethink your code.

The longer and more complex your subroutines are, the more you risk coupling unrelated data and operations. You also increase the potential for side effects.

Methods should be compositional. They should do exactly what they say they do at the highest level of abstraction possible and nothing more. If the problem that you're trying to solve is comprised of multiple subtasks, a separate method should be written to accomplish each subtask. Simple.

For example, let's say that you're trying to replace a file's extension. You could write this code in the middle of an unrelated method that requires it as a subtask:

...
new_path = old_path.rsplit('.', maxsplit=1)[0] + '.rtf'
...

But please, prefer seperate, contextual operations like this:

...
new_path = replace_file_extension(old_path, 'rtf')
...

def replace_file_extension(file_path, new_extension):
    return f'{trim_file_extension(file_path)}.{new_extension}'

def trim_file_extension(file_path):
    return file_path.rsplit('.', maxsplit=1)[0]

For the two extra minutes to it takes write, the benefits are staggering.

First, the code is easier to read since the operation's been given context. It's also easier to unit test since it's an independent function.

Furthermore, it's reusable, which prevents you from committing the sin of code duplication.

Imagine that our theoretical application replaces file extensions at multiple places in the code path.

If you go with approach #1 (using rsplit and concatinating the new extension directly in-line), you'd not only be writing the same code in different modules, but you'd be duplicating its bugs everywhere.

When someone notices it has a bug, the person responsible for fixing it would probably only fix a single occurrence since the bad behavior is self-contained in multiple modules. You're also cluttering the method with more unrelated logic.

With approach #2, you can address the bugs in trim_file_extension without having to make changes anywhere else. (And since it was unit testable, you probably caught the bug as you were writing the tests.)


Once you begin coding compositionally, your mind will become more apt to code reuse.

At my previous job at 128 Technology, we created tons of awesome Python libraries from functions and patterns that we found ourselves using over and over again.

Whenever we noticed similar patterns, we'd split the code out, make the behavior more generic, and use it all over the codebase. Not only was our code easier to maintain, but we had fantastic peace of mind because all of it was thoroughly tested.

Good times...

26. From Pennsylvania, USA. Software engineer at Amazon.com, travel enthusiast, scuba divemaster, amateur photographer. A bit restless.