Saturday 15 September 2012

Does TDD lead to explosion of unnecessary abstractions and bad readability?

Thanks go to Grzegorz Sławecki for inspiration.

Exemplary problem

Imagine we're in a different reality, where tests are written after the code. we have written the following code (by the way, it's not perfect as an example of using WCF, but it's out of scope of this post):

public class SendingFromBuffer
{
  Buffer _buffer;

  public SendingThroughBuffer(Buffer buffer)
  {
    _buffer = buffer;
  }
    
  public void Perform()
  {
    var destination = CreateDestination ();
    while (_buffer.HasPackets()) 
    {
      destination.Send (_buffer.GetPacket ());
    }
  }
    
  public Destination CreateDestination()
  {
    var binding = new BasicHttpBinding();
    var endpoint = new EndpointAddress(
      "http://localhost/MyService/Something");
    var channelFactory = new ChannelFactory<Destination>(
      binding, endpoint);
    return channelFactory.CreateChannel();
  }
}

And now we would like to write a unit test for it. In particular - for the sending logic.

Actually, we could do it easily if not for the result of CreateDestination() method. It's a WCF channel, so sending anything through it without actually connecting to the server yields an exception. To execute the code as is, we would need set up a server and use the operating system networking capabilities and (remember!) we want to be isolated from the operating system as much as we can during unit test. Also setting up a WCF server impacts unit test performance, making it slower.

As the code wasn't written test-first, it's already untestable, so we'll have to refactor it (wow, and imagine it was created just about an hour ago!) to isolate ourselves from the WCF channel. Thankfully, we have the channel creation encapsulated in its own method, so it's very easy to extract it to a separate class and use dependency injection to inject either a real object or a mock into the SendingFromBuffer class.

After the refactoring, the code looks like this:

public class SendingFromBuffer
{
  Buffer _buffer;
  DestinationFactory _destinationFactory;

  public SendingThroughBuffer(
    Buffer buffer, DestinationFactory factory)
  {
    _buffer = buffer;
    _destinationFactory = factory;
  }
    
  public void Perform()
  {
    var destination = _destinationFactory.Create();
    while (_buffer.HasPackets()) 
    {
      destination.Send(_buffer.GetPacket ());
    }
  }
}

public interface DestinationFactory
{
  Destination Create();
}
  
public class WcfDestinationFactory : DestinationFactory
{
  public Destination Create()
  {
    var binding = new BasicHttpBinding();
    var endpoint = new EndpointAddress(
      "http://localhost/MyService/Something");
    var channelFactory = new ChannelFactory<Destination>(
      binding, endpoint);
    return channelFactory.CreateChannel();
  }
}

Note that we have introduced a factory object with one method that has just a few lines. this yields the two following questions:

  1. Aren't we introducing an unnecessary abstraction that bloats the design?
  2. Aren't we hurting readability with this additional object?

Let's take those questions on one by one.

1. Aren't we introducing an unnecessary abstraction that bloats the design?

Ok, think with me - is this factory really necessary? I mean, it's just a few lines of code! Why the heck would we want to delegate that to a separate class?

Well, we do and I'll tell you why. It's because the reason for creating classes is not to move lines of code. You see, the goal of introducing all this object oriented stuff into programming languages in the first place was to enable us to do the following:

  1. encapsulate something
  2. loosen coupling
  3. attain better cohesion
  4. get rid of redundancy
  5. make the class easier for testing (ok, this one came later, but it's equally important)
  6. enhance readability
  7. provide a better chance for reuse

Do you see "lines of code" anywhere on this list? No? Good, because me neither :-). Well, maybe the readability part, but we'll deal with that later. Sadly, this is a common myth that many junior programmers repeat endlessly - that the amount of lines of code is a criteria for something to be sealed inside a class or a method. The truth is, one can even introduce a class that doesn't have ANY executable code (many Null Objects are implemented this way).

I mean, if we really don't care about the things outlined above, we can just as well write everything in procedural style - who needs all these objects, classes, interfaces and stuff? No one, because they're a design tool, not an implementation tool. And as "lines of code" is not a design quality, it has nothing to do with design.

So, let's evaluate the decision to introduce a factory in the context of real design qualities described above, not those made up, like "lines of code":

Encapsulation

Introduction of a factory encapsulates the knowledge on what library are we actually using for communication. By doing this, we can get rid of another "tiny bit of code", that impacts this class significantly:

using System.ServiceModel;

you see, not having this line inside our class file frees us from dragging a heavy dependency around every time we want to reuse the class. This is a great benefit, since getting rid of such heavy dependencies enhances reuse.

Coupling

By decoupling the class from WCF, we open it for extensions - we can put in another factory that creates different implementation using another remote procedure call mechanism. We gain reuse and extensibility in return.

Cohesion

By getting the knowledge about creating communication channel away from the SendingFromBuffer class, we make it less prone to change, because any change in the way communication occurs does not impact this code. Cohesive classes make finding defects and issues very easy. E.g. When the app fails with WCF connection, we've got a very limited area to search for problems. We don't have to wonder at all whether a code fulfilling another responsibility in the same class is affected by the WCF channel creation somehow or not.

Why do I think I have the right to say that the WCF part is a "separate responsibility"? Because there's a strict boundary between this code and the rest of the class. Just look at the table below:

Question Wcf part The rest of the class
Need to unit test? No * Yes
Part of application domain logic? No Yes
Can be changed to allow easier unit testing? No Yes

* - well, I'd probably write a unit test to specify what parameters should be set on the channel after creation and that the created instance is not null, but I wouldn't try to send anything through it.

So as you see, these two aspects of the class are different and hence should be separated.

Redundancy

Let's assume that this isn't the only place where we're opening up a connection and sending a message through it. In such case, without the factory, we'd have to introduce a method for creating the channel in every class that needs it. Every time the logic for creating a channel changes, we have to make corrections in all of these methods. This is asking for trouble.

Testability

Using dependency injection and a factory opens the class for unit testing - which we wouldn't be able to do otherwise. Of course, we could make the creation method virtual and override it in a subclass with a stub implementation, but is really more straightforward or less effort than the factory solution? By the way, TDD IS about design and analysis, so when a test tells you that you need an additional abstraction, then you do. That's what you write the tests for - to discover them.

Readability

we'll take care of it in a second.

Reuse

This is a result of all the other points. A class that is easy to reuse encapsulates its responsibility well, dooes not have accidential coupling, is very cohesive, does not include redundant aspects, is documented with set of simple and well-written unit tests that describe all its behaviors and is readable.

Ok, that pretty much represents my thinking on the issue, the only thing left is question number 2:

2. Aren't we hurting readability with this additional object?

How do you define readability? Many people mistakenly think that readability is about looking at the code and being able to tell everything that happens from it - such code would make everything it does explicit. If so, then it would be best not to partition the code in any way, but just use a single function or a method that does all the stuff. Thinking this eay leads us to the conclusion that each time we're introducing a method or a variable, we hurt readability, because things become less explicit. e.g. this:

int a = pow(2,2);

is less explicit than:

var a = 2*2;

And yet, we'd rather use the first form rather than the second, because it's more intention-revealing (although the second does a better job in telling what actually happens).

So as you might have already guessed - for me, the main trait of readability is not explicitness. It's revealing intentions.

Let's take a look at the two following pieces of code:

public void Process(Frame frame)
{
  validation.PerformFor(frame);
  compression.PerformOn(frame);
  encryption.PerformOn(frame);
}
...and:
public void Process(Frame frame)
{
  if(frame.Destination == null)
  {
    throw new Exception();
  }
  
  for(int i = 0 ; i i < frame.NumberOfWords ; ++i)
  {
    var word = frame.Words[i];
    frame.Words[i] = word.Trim();
  }
  
  for(int i = 0 ; i i < frame.NumberOfWords ; ++i)
  {
    var word = frame.Words[i];
    frame.Words[i] = word.Reverse();
  }
}

Which one is more explicit? The second one, of course. Which is more readable? I'd argue that the first one - not only does it make clear what steps are performed, it also makes a point about their order. This is crucial, because encrypting a compressed content would give you a different result than compressing an encrypted content. If we're sending the frame to any other system, it can probably interpret only one of these two options. That's why the order is probably something that is enforced by requirements and it's great that it's reflected in the code.

By the way, one could argue, that we could just do this:

public void Process(Frame frame)
{
  validate(frame);
  compress(frame);
  encrypt(frame);
}

That's true, although, when reading this code, I'd most probably jump into each of these methods to see what they do - after all, it's the same class and I'm trying to grasp it. This forces me to get my head around more details than I need to, because everytime I'm reading this class, there is always "that other code", that's doing some fancy things and may modify some fields that may be used somewhere else. For example, how do I know that compression does not set a flag somewhere to trigger different encryption in the encrypt() method depending on the compression ratio?

My brain tends to work differently when I see other responsibilities pushed into separate objects. Then, I focus only on the behavior of the class I'm reading, since the objects I'm delegating to cannot affect anything more than what I pass to them. I can quickly grasp the behaviors my class expects from its collaborators and instantly tell whether the responsibility of the class is fulfilled well or not. For example, I don't care what validation is - the only thing I know is that I need to call it and it will just "handle stuff".

As a side note - there are some people that hate design patterns and say that they hurt readability. While I agree that "pattern religion" is bad, I have actually seen many, many, many situations, when using design patterns enhanced readability, because they would just "fit in" into the solution. E.g. when I want to add tracing of every method call made against an object, a decorator is an obvious choice. Then, looking at the decorator, one can instantly see that the intention of the code is to provide tracing over a method call - it doesn't have to be digged for.

Ok, that's everything I have for today. Feel free to comment with your thoughts!

2 comments:

JF said...

The problem is, abstractions CAN be taken too far. And not only is this often the case, but I imagine that every developer has asked himself if an abstraction is one too many. IMO for every word written on abstractions, there should be one on concretions close by.

Grzegorz Gałęzowski said...

JF, thanks for the comment and sorry for late reply.

Agree, abstractions can be taken too far. Actually, I was guilty of this myself few times.

It is not sufficient to cut software into decoupled pieces - it also matters what these pieces are and how they communicate.

You have a point when you write that each word on abstractions should be accompanied by a word on concretions. The reason I dealt with abstractions in this post was that this was the argument I received.

There are different teams and contexts - some may have trouble with too much abstraction and creating a framework for everything. Some other maye suffer from not abstracting enough and not decoupling enough. My context is usually the latter.