Sunday, December 29, 2013

Me & Open Source - a New Year's Resolution

I am working as a software developer full-time for several years now. I've already written many thousands of lines of code, but nearly none of it is publicly available. Most of it remains a property of my employers for who the code was created, but there are always some pieces that are already open source or side projects that don't need to adhere to strict ownership rules and can be generalized enough to be shared with others.

I always wanted to share and I had hundreds of ideas for full-size projects that I could do when there's enough time. But in the reality, there was never enough time and motivation for anything big. Full-time job takes a lot of time, the rest goes to family and other duties. But there is always a few late evening hours each week that can be saved.

All the things I've shared with the community up to date were mentioned here on my blog. These were mostly short gists or snippets. The only larger thing was an HTML building API based on the "loquacious" interface idea, published on GitHub as NOtherHtml. I still love the design! But I admit, it was rather one-timer, not a major contribution to the open source world.

So, here is my New Year's Resolution for 2014: try spending those few late evening hours on contributing to the open source projects. Yeah, I know no one cares, but I'm sharing it as posting it publicily will motivate me more than leaving it just as my internal thought. There will be probably no biggies for the reasons mentioned, I think I'll rather focus on bug fixes or small features for the projects I use. It's easiest, it's beneficial, it's fair.

As a start, yesterday I've submitted my first pull request to the "real" open source project I use - JP. Boodhoo's developwithpassion.specifications. I've fixed a bug in the extension method that helps assert collections elements equality, but failed when the collections were of different lengths. I feel I'm already over the hump!

Sunday, October 6, 2013

ASP.NET MVC: Current route values implicitly used by Url.Action

I really don't like when I work with statically-typed language like C# and still need to rely on strings or other loosely-typed constructs for things that may be strongly-typed. ASP.NET MVC went its way towards strongly-typing and now offers things like strongly-typed view models, but still lacks official built-in support for building internal links and URLs without using plain controller and action strings.

Fortunately, here is where MVC Futures comes in (available on NuGet for MVC 3 or 4). It offers a nice expression-based extension methods to build links (on HtmlHelper):

Html.ActionLink<TheController>(c => c.TheAction(theParam))

and URLs (on UrlHelper):

Url.Action<TheController>(c => c.TheAction(theParam))

It's working fine and it protects me from working with those ugly string-based methods and it is also a very convenient way to specify the target action parameters - in definitely nicer way then manually building the RouteValuesCollection required by those traditional methods.

But there is one not well-known feature of MVC (at least I haven't heard a lot about it) that for me seems to be disturbing here - MVC is reusing the route values from the current request when building URLs and no other value specified. This maybe makes some sense for the string-based methods (you don't need to build those RouteValuesCollections over and over again), but it makes no sense for the expression-based approach as the compiler enforces specifying all the parameters explicitly in order to build a lambda expression that compiles.

And here comes the nasty bug I've recently spent some hours on - passing null value is like not passing a value at all. MVC puts null under the appropriate key in the RouteValuesCollection, it then goes down to Route.GetVirtualPath method, which also takes the current route values from the RequestContext. And then the evil happens - the low-level System.Web.Routing's method ParsedRoute.Bind ignores the null value and - bang - it takes the value from the current request, if any accidentally matches by the parameter name.

It means that when trying to build an URL that passes parameter param equal to null and the current request accidentally have parameter named also param (regardless of its type or existence of any logical connection), the request's param value will be passed instead our explicitly demanded null value. And I can see no way to pass null in this case.

Actually, the bug (or feature?) exists in case of string-based URL builder methods, too. But here it is much more visible and obviously wrong.

The only way to fix that strange implicit parameter inheritance by name I know is to work around it - either by removing the name collision by renaming one of the parameters (yuck!) or by using own extension method.

I've created my own extension method to generate URLs/links that has the same signature as the buggy ones and placed it in the namespace more visible than those replaced (you'd better check your usings carefully). Here is the UrlHelper extension I have - HtmlHelper implementation generally calls UrlHelper and wraps its result in a link, so I'll omit it here. The method calls the same methods as the original method being replaced, but it tweaks the RequestContext instance: instead of passing the instance available in UrlHelper (which contains those conflicting route values from the current request), I'm creating new RequestContext reusing Route and IRouteHandler instances from the current request, leaving the actual route values empty. This way there's no possibility for the current values to "infect" our URL building process anymore. Interesting is that RouteData constructor doesn't enforce the values being set, anyway.

public static string Action<TController>(this UrlHelper helper, Expression<Action<TController>> action)
    where TController : Controller
{
    // we need to recreate RequestContext without values, to override the MVC "feature"
    // that replaces null specified in action with value inherited from current request
    var currentRouteData = helper.RequestContext.RouteData;
    var fixedRouteData = new RouteData(currentRouteData.Route, currentRouteData.RouteHandler);
    var fixedRequestContext = new RequestContext(helper.RequestContext.HttpContext, fixedRouteData);

    var valuesFromExpr = Microsoft.Web.Mvc.Internal.ExpressionHelper.GetRouteValuesFromExpression(action);
    var vpd = helper.RouteCollection.GetVirtualPathForArea(fixedRequestContext, valuesFromExpr);
    return vpd == null ? null : vpd.VirtualPath;
}

Well, it seems to be yet another example of the fact that null is rather vague and problematic thing. It can represent many different notions, depending on the context and implementation - like no value, empty value, inherited value etc. Althouth ASP.NET MVC treats null as "inherit the value" not only in this case, I'd always prefer being explicit about that kind of behaviors.

Wednesday, October 2, 2013

On Abstractions in Requirements

We, as a software developers, are used to work with abstractions and to strive to work with abstractions rather than with specific implementation details. We always try to build the abstraction layers on top of complicated rules and design our systems so that the details are hidden at low levels. The system, looking from some distance, is built from nice generic bricks.

But we're much less used to have our abstractions in line what our business requirements are and how the stakeholders look at it. We're often thinking about abstractions only on technical, source code level. But ideally, if something is really a well-designed abstraction, it should serve everyone equally well.

Know the abstractions the customers use

When building the software for a particular group of users, they probably have some ideas how things should work and how things relate to each other. Ask your customers not only for the details you need to have to complete your concrete implementation, but also ensure you know the bigger picture. If your customers define some rules, before implementing it, get the knowledge what these rules are about and what they represent. These rules probably result from some customers' business concept and it will appear in multiple requirements or ideas over and over again. Knowing the concept upfront will save some time finding duplicated rules and/or inventing the abstractions on your own. If not identified early enough, introducing it to the codebase later will probably mean conflicts with existing, made up abstractions.

Agree with customers on the abstractions

Don't introduce high-level abstraction for things that sounds similiar if your customers don't use those things together. When the product evolves, this kind of abstraction will probably end up as a gordian knot tying two things that are meant to go different ways. Eventually you'll have to remove the abstraction on the way (which may not be easy) or some dirty hacks will grow around it.

Talk with everyone about naming

Naming is one of the most difficult thing in programming, right? So why not rely on our customers. They probably know better how to name the things they are familiar with. Discuss the synonyms and agree on the naming before introducing the abstraction to the codebase. It may be a good idea to have a glossary of terms the customers use and how it's represented in the codebase so that different developers (but also managers, technical writers, customer support, etc. - essentially all the stakeholders) don't use another synonym for the same thing. This will avoid confusion and misunderstanding. Moreover, mapping of "documents" to "articles" through "news stories" will be weird, anyway.

Don't use details when there's an abstraction agreed

Whoever create the requirements for your project, they should be aware you know the concepts they use, so that they will use it in the communication, for everyone's convenience. Subsequent requirements should not repeat the definition of the abstraction in detail, but rely on the fact you already know it. Otherwise it will probably end up inconsistent or implemented locally, separately from the previously introduced abstractions. When the existing concept's definition turns out to be an obstacle and it needs to be clarified with details later in the project's development, it probably means that either there is no real abstraction there or the abstraction is not defined or understood correctly by all parties.

Make sure you have proper number of well understood abstractions

Abstractions should be - well - abstract enough. It means that it should be easy to grasp it and explain it to a new team member or new customer representative. If it's not - it's probably not a high-level abstraction. There are also practical limits how many high-level abstractions make sense in the project. If there are too many, so that they can't be learned easily, either the project is overcomplicated or - again - these are not really high-level abstractions.


This all may sound obvious, but I've recently experienced it isn't. In our project, there was a high-level decision made that we need to have our content categorized. The idea of the categories went through several layers of business analysts and subject-matter experts and before it get to the development team, it became fuzzy and the general concept flittered away. We get nothing about categories, but a bunch of seemingly unrelated yet not trivial requirements to "move things around here and there". We implemented it as so rather thoughlessly. And some time later there was a business decision that things should work together as they are related. But that "related" thing doesn't exist for us at all - we have several own concepts of "relations" instead, grown on the developers' need for abstractions. And now we need to go few big steps back to align. Oops!

Wednesday, September 25, 2013

The Alternative Development Methodologies Glossary

Have you ever wondered what all those TDDs, BDDs, DDDs, ATDDs mean? Are you sure your project is following all these methodologies that form the software development mainstream? Do you know what to do to stay up to date?

Here is the comprehensive and exhaustive list of various established development techniques that you can meet (or have a chance to introduce) in your project. Feel free to be inspired and use as many -DDs as you can!

ADD: Anticipation Driven Development
When the requirements are carefully hidden in stakeholders' minds and not shared with the development team until the deadline. In that methodology developers need to be ready for everything.
BDD: Bug Driven Development
Delivering anything as quick as possible and explaining all the missing features as "just bugs". Then fixing it using a mix of magic and dirty hacks.
CDD: Copy-Paste Driven Development
Introducing new feature to the system is easy as long as the feature may be put together from the borrowed parts of other features. Low coupling, sort of.
DDD: Development Driven Development
Have you ever had an impression that the project's code just live its own life?
EDD: Email Driven Development
As a productivity improvement technique, developers are expected to reduce wasting time in those ugly black IDEs and keep focused on the real stuff being discussed via email. 60 times a day.
FDD: Frustration Driven Development
After delivering the final requirements, implementing and making the code well-factored and production-ready, the new final (finaler) requirements arrive, without changing the deadline. And then, two weeks later, there are the finalest requirements.
GDD: Google Driven Development
Let's google it up. All the code is already there, for sure.
HDD: Hardcoding Driven Development
KISS and YAGNI plus being extremely explicit everywhere. So explicit that any business logic or calculations makes a smell. May also be the Test Driven Development that went too far (the code works only for cases used in the tests, because all the answers are hardcoded).
IDD: Idiot Driven Development
Self-explaining. But don't use that name when talking to your boss.
JDD: Job Security Driven Development
The primary goal each developer seems to have is to become indispensable. It's easy to do when adhering to several simple rules. That technique encourages great productivity of developers (when counted in lines of code written).
KDD: Knot Driven Development
From the Gordian Knot - when the cyclomatic complexity is higher than the number of lines. And when all the special cases are covered with enough attention and care.
LDD: Layering Driven Development
The technique focused on proper code encapsulation and isolation - each developer creates his/her own layer of abstraction. Who wants to work with that legacy code, anyway?
MDD: Meeting Driven Development
Methodology that relies on the premise that each development problem may be solved by discussing it over and over regularly in the adequately large group for the adequately long time. And if it still doesn't succeed, invite even more managers.
NDD: Not Driven Development
IDD with the principle of team self-organization.
ODD: Outlook Driven Development
See EDD. Syntax coloring plugin for Outlook would be nice. And also source control plugin, to be able to commit the code by email, finally.
PDD: Patch Driven Development
When the development is time-constrained, with fixed date, anything should be always deployed to production on time, regardless on its development status. Then there's a patching phase, that is trying to understand what was deployed and replacing it with working bits by some shotgun surgeries.
QDD: Quick-Fix Driven Development
Mix of Patch Driven Development, TODO Driven Development and some hope.
RDD: Refukctoring Driven Development (officially: Refactoring Driven Development)
When it's not possible to isolate the crappy code written last year (see FDD), comment it out and leave with "TODO" flag. Then write it once again.
SDD: Showcase Driven Development
Don't do anything unless there's a customer showcase tomorrow. Then rigorously apply QDD and TDD.
TDD: TODO Driven Development
Project planning methodology that assumes the existence of some time in the project's future when there will be no new feature requests or bugs and the developers will have a lot of time to reduce the technical debt.
UDD: Utopia Driven Development
Pragmatics' name for TODO Driven Development.
VDD: Versatility Driven Development
The methodology about developing a solution for every single customer need (including those not yet discovered) at once, by a single project. Also the way of writing code so that it can be re-configured to do other things in the future, if the need arises. Also known as Soft Coding.
WDD: Wheel-Reinventing Driven Development
The concept based on natural and inborn lack of trust for others' code, including guys next room. Instead of taking a risk of using others' code, let's write it from the scratch again (and use both versions in the same project in the same time). Sort of loose coupling, again.
XDD: Xero Driven Development
Mutation of CDD. The only features possible to implement are those like already implemented. We need to have some strong foundations to copy from, right?
YDD: Yesterday Driven Development
Pretty broadly used time management methodology in software development. All tasks are due two days before tomorrow.
ZDD: Zen Driven Development
Software development and mind training at once. Occurs when the codebase requires "the attainment of enlightenment and the personal expression of direct insight" to work with.

Let me know if I forgot anything from the mainstream.

Thursday, July 11, 2013

Liskov Substitution Principle vs. immutability

Today I had a very interesting discussion about Liskov Substitution Principle (LSP) - what it really means and how to avoid breaking it, of course using an overused example of squares and rectangles. Liskov's principle tells us to design our inheritance trees or interface implementations in such a way that whenever the code uses the base class (or interface), any of the derived classes (implementations) can be used without breaking existing behavior or invariants. In wider understanding, it also means that whenever we use an abstraction like base class or interface, we should not expect any particular implementation provided in the runtime nor should we know anything about that implementations. This also forbids any constructs like conditions based on implementation's concrete type - if we need it, we're working with leaky abstraction that need to be rethought and not hacked around.
void MethodThatGetsAbstraction(IAbstraction abstraction)
{
    if (abstraction is SomethingReal)
        YouReDoingItWrong();
}
Let me remind the classical example of shapes I've mentioned before. As we know, squares are rectangles, and we often represent "is a" relationships like this in code using an inheritance - Square derives from Rectangle (I don't want to discuss here whether it makes sense or not, but the fact is this IS the most often quoted example when talking about LSP). A rectangle, mathematically speaking, is defined by its side sizes. We can represent these as properties in our Rectangle class. We also have a method to calculate an area of our rectangle:
public class Rectangle
{
    public int Height { get; set; }
    public int Width { get; set; }

    public int CalculateArea()
    {
        return Height * Width;
    }
}
Now enter the Square. It is a special case of rectangle that has both sides of equal lengths. We implement it by setting the Height to equal Width whenever we set Width and opposite. Now consider the following test:
class When_calculating_rectangle_area
{
    [Test]
    public void It_should_calculate_area_properly()
    {
        // arrange
        var sut = new Rectangle();
        sut.Width = 2;
        sut.Height = 10;
        
        // act
        var result = sut.CalculateArea();

        // assert
        Assert.Equal(result, 20);
}
It works perfectly fine until, according to the ability given us by the Liskov's principle, we change Rectangle into its derived class, Square (and leave the rest unchanged, again according to the LSP). Note that in real-life scenarios we rarely create the Rectangle a line above, we got it created somewhere and we probably don't even know its specific runtime type.
class When_calculating_square_area
{
    [Test]
    public void It_should_calculate_area_properly()
    {
        // arrange
        var sut = new Square();
        sut.Width = 2; // also sets Height behind the scenes
        sut.Height = 10; // also sets Width behind the scenes, overriding the previous value
        
        // act
        var result = sut.CalculateArea();

        // assert
        Assert.Equal(result, 20); // naah, we have 100 here
}
Square inheriting from Rectangle breaks the LSP badly by strenghtening the base class' preconditions in a subtype - base class doesn't need an equal sides, it's the implementation-specific whim we don't even know about if we're only providing the base class to be derived in the wild. But what is the root cause of that failure? It's the fact that Square is not able to enforce its invariants (both sides equal) properly. It couldn't yell by throwing an exception whenever one tries to set the height differently than width, because one may set width earlier than height - and we definitely don't want to enforce particular property setting order, right?. Setting width and height atomically would solve that problem - consider the following code:
public class Rectangle
{
    private int _height;
    private int _width;

    public virtual int SetDimensions(int height, int weight)
    {
        _height = height;
        _width = width;
    }

    public int CalculateArea()
    {
        return _height * _width;
    }
}
public class Square : Rectangle { public override int SetDimensions(int height, int weight) { if (height != width) throw new InvalidOperationException("That's a weird square, sir!"); _height = _width = height; } } But now we've replaced one LSP abuse with another. How the poor users of Rectangle can be prepared for SetDimensions throwing an exception? Again, it's an implementation-specific weirdness, we don't want our abstraction to know about it, as it will become a pretty leaky one. But is width and height really a state? I'd rather say they are shape's identity - if they change, we are talking about another shape than before. So why expose setting possibility at all? This leads us to the simple but extremely powerful concept of immutability - a concept in which the object's data once set cannot be changed and are read-only. The only way to populate an immutable object with data is to do it when creating the object - in its constructor. Note that constructors are not inherited and when new-ing a Rectangle we are 100% sure it's not Square or any other unknown beast, and conversely, when constructing Square we can enforce our invariants without an influence on base class behaviours - as all properly constructed Squares will be valid Rectangles.
public class Rectangle
{
    private int _height;
    private int _width;

    public Rectangle(int height, int weight)
    {
        _height = height;
        _width = width;
    }

    public int CalculateArea()
    {
        return _height * _width;
    }
}

public class Square : Rectangle
{
    public Square(int size)
        : base(size, size)
    {
    }
}
LSP satisfied, code clean, nasty bugs and leaky abstractions removed - immutability is king!

Thursday, July 4, 2013

Code Smells: Rigidity

I was recently preparing a small talk about several code smells, as discussed by Robert C. Martin in his famous 'Clean Code' book. I've decided to share my notes - it's nothing revolutionary, a bit of copy-paste from well-known resources and a bit of my personal comments and understandings.

The one for today is rigidity. Rigidity literally means "the inability to change or be changed to fit changed circumstances". And that nicely fits what the smell is about: "The software is difficult to change. A small change causes a cascade of subsequent changes". We all know it - some tasks that sounded easy at the beginning became a few weeks of bug hunting or turned out to be too difficult to finish without destroying everything around.

When it happens?

  • When the code is written in procedural style - a.k.a. "everything in one place" - with a lot of very narrow and specific, possibly unrelated conditions and special cases handling included directly in the "main" method,
  • When there is a lack of abstractions - meaning that the code operates on the low level of technical details instead of focusing on more real-life concepts, like bit flags fiddling to check for object's characteristics instead of having it available as a method or a property,
  • When the code implements some general concepts, but defines it with details specific to particular use case - like a code to render an HTML table from the general matrix that also decides the table header has dark background,
  • When a single responsibility is spread between many classes or code layers - like checking user's permissions on every layer from the data access layer up to the UI layer,
  • When components need to know a lot of details about each other to communicate properly (leaky abstractions).

How to avoid it?

  • Thinking in general terms, wrapping the implementation details into abstractions that are understood on less technical levels,
  • Creating code with satisfying the requirements first in mind, not what API's we're using or the constraints we have around,
  • Using object-oriented techniques and SOLID - with its small classes, single responsibilities and open design,
  • Encapsulating the code in the logical pieces, defining a clear boundaries and interfaces between them - if you can't tell what is the border of responsibilities, it's probably done wrong,
  • Depending upon abstraction, not implementation.

Thursday, May 30, 2013

What's planned for NHibernate 4?

Yesterday Ricardo Peres wrote an article about the state of NHibernate development, comparing it to Entity Framework and noticing that NHibernate's future doesn't look bright. It's pretty much in line with what I meant asking if NHibernate is already dead few months ago. But, as I'm still actively using NHibernate in my project and there are no prospects of changing that, it would be nice if that's not the case.

Although there's no clear roadmap available, it seems that NHibernate's community is planning a major release, bumping the version number to 4. The reason for that is pretty simple - the product is going to switch from .NET 3.5 to 4, mostly to retire Iesi.Collections dependency that provided set implementations that are now partially available in the BCL. This is a major breaking change and that justifies changing the major version number, for sure. Some obsoletes will be cleaned up in the same time.

Are there any new features planned for 4.0? Well, there are some, but I can't say the list is impressive. SQL Server 2012 will be supported better (sequences, limit feature), Ingres9 will be also supported out of the box and few other dialects will have some improvements. But, let's admit it, this is not a big deal for existing users. There are of couse several bugfixes and minor improvements done and planned for that version, i.e. in Linq provider (NH-3186, NH-2852, NH-3056, NH-2915, NH-3256), in mapping-by-code (NH-3280, NH-3313). There is also one possibly significant performance improvement with Linq queries done.

The draft version of release notes is available as a reference for all JIRA issues already resolved. Unfortunately, I can see no "flag ship" feature that might interest, majority of the list is rather maintenance of existing features or cleanup. Definitely not enough to encourage more interest in NHibernate, leaving the framework targeting breaking change the only thing that really justifies releasing this as 4.0 and not as 3.4.

And to make things clear, I'm not complaining that current contributors are lazy. I think Hazzik and Oskar Berggren are doing fantastic job keeping things up and running. The problem is that they are the only really active members of the community. And the size and complexity of the project is big enough to make gaining new contributors really hard, for sure. I've tried to fix one bug once - and I failed.

Anyway, my fingers are crossed in hope of new great features in NHibernate, but, looking objectively, I don't expect anything like that to happen.

Monday, April 15, 2013

NHibernate's LINQ GroupBy capabilities

Recently in the project that is using NHibernate 3.2, I needed to use some aggregations in my database queries. The use case was pretty typical - aggregate some pre-filtered set of invoices by the product sold, count how many sales were there for each product, order the data by total sales value and take top 10 results. It is pretty easy to accomplish in SQL:

SELECT TOP 10 Product, COUNT(*) AS SaleCount, SUM(Value) AS TotalValue
     FROM Invoices
     WHERE Cancelled = 0
     GROUP BY Product
     ORDER BY TotalValue DESC

It is also pretty easy to express in LINQ syntax:

Invoices.Where(i => i.Cancelled == false)
     .GroupBy(i => i.Product)
     .Select(g => new TopSellingProduct 
                  {
                      Product = g.Key, 
                      SaleCount = g.Count(), 
                      TotalValue = g.Sum(i => i.Value) 
                  })
     .OrderByDescending(g => g.TotalValue)
     .Take(10);

I knew that NHibernate's LINQ provider offers limited support for GroupBy operator. Taking into consideration that all the lambda expressions in the query are in fact expression trees that need to be parsed and expressed in SQL, what I expected to be the most problematic, was the Select clause that creates new TopSellingProduct instances (which is not a NHibernate-managed entity) and sets its properties, in case of Sum even using nested lambdas. Actually, this was not a problem at all, even when using anonymous types inside - impressive! NHibernate somehow gets the list of fields and aggregation functions that needs to be fetched and turns it into SELECT clause correctly.

But the query above couldn't be translated into SQL anyway. It turned out that the operators that seemed easier to implement - OrderBy, Take and Skip - were not supported. So with NHibernate 3.2, I could only create an aggregation and fetch all the aggregated values at once, without ordering or paging. In my case, it could mean fetching 50k rows just to show top 10. Not an option.

Fortunately, quick search through the NHibernate's JIRA dashboard gave me the hope that things look better with the newer NHibernate version - 3.3.1. I've upgraded seamlessly using NuGet, and here is the summary of my observations:

SQL featureLINQ syntax exampleNHibernate 3.2NHibernate 3.3.1
SELECT of simple aggregated value; COUNT() function.GroupBy(x => ...).Select(g => g.Count())OKOK
SELECT of anonymous class.GroupBy(x => ...)
.Select(g => new { g.Key, Count = g.Count() })
OKOK
SELECT of named class
.GroupBy(x => ...)
  .Select(g => new MyType 
               {
                 Key = g.Key, 
                 Count = g.Count()
               })
OKOK
SUM(), MIN(), MAX() functions
.GroupBy(x => ...)
  .Select(g => new 
               { 
                 Sum = g.Sum(x => ...), 
                 Min = g.Min(x => ...), 
                 Max = g.Max(x => ...)
               })
OKOK
AVG() function.GroupBy(x => ...).Select(g => g.Avg(x => ...))buggy, truncates value to int (NH-2429)OK
WHERE (condition applied before aggregation).Where(x => ...).GroupBy(x => ...)OKOK
HAVING (condition applied after aggregation).GroupBy(x => ...).Where(g => ...)silent failure, produces subquery instead of HAVING clause and returns wrong results (NH-2883)OK
ORDER BY (sorting).GroupBy(x => ...).OrderBy(g => ...)MismatchTreeNodeException (NH-2781)OK
TOP / LIMIT (number of results).GroupBy(x => ...).Take(10)NotImplementedExceptionOK
OFFSET (paging support).GroupBy(x => ...).Skip(10)NotImplementedExceptionOK

Things look MUCH better now - everything what I need (and a bit more) is correctly supported with the newest LINQ provider.

Monday, April 8, 2013

NHibernate Equals implementation with proxies vs. ReSharper - or yet another couple of hours lost

I'm already quite sensitive to missing Equals and GetHashCode implementations for NHibernate entities or value types that caused issues like re-inserting or duplicating items within collections hundred times.

A good rule for Equals and GetHashCode (that is clearly stated in the documentation) is to make it do comparisons based on the set of fields that create "business" (real-life) identification of an object whenever possible - and not on database-level identifiers. It works well also when comparing objects from different sessions (detached) or not yet persisted (transient) - without any "unproxying" magic.

Today, my personal counter of hours devoted into cursing and fighting NHibernate-related corner cases or strange issues increased once again. I have an interesting (two hours later - frustrating) case of entities being mixed up in spite of correct SQL queries issued. And I was quite confident that my ReSharper-generated Equals and GetHashCode methods were correct and the root cause was somewhere else. Just look how simple was the code of my entity:

public class City
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }

    public virtual bool Equals(City other)
    {
        if (ReferenceEquals(null , other))
            return false ;
        if (ReferenceEquals(this , other))
            return true ;
        return Equals(other.Name, Name);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null , obj))
            return false ;
        if (ReferenceEquals(this , obj))
            return true ;
        if (obj.GetType() != typeof (City))
            return false ;
        return Equals((City ) obj);
    }

    public override int GetHashCode()
    {
        return (Name != null ? Name.GetHashCode() : 0);
    }
}

I am comparing City instances using a natural key - its name. In the faulty code I was querying the database for different entities that reference City and later the referenced cities were compared for equality. Here is the simplified sketch of the test case (written in Machine.Specifications) with initialization that creates the object graph and two fetching scenarios as a separate tests.

public class EqualsTest : DatabaseTests
{
    Establish context = () => sut.WithinSessionAndTransaction(sess =>
    {
        var city = new City() { Name = "Llanfairpwllgwyngyll" };
        sess.Persist(city);
        sess.Persist( new Address () { City = city });
        sess.Persist( new District () { City = city });
    });

    It should_have_equal_cities = () => sut.WithinSessionAndTransaction(sess =>
    {
        var address = sess.Query<Address>().Single();
        var district = sess.Query<District >().Single();

        district.City.ShouldEqual(address.City);
    });

    It should_correctly_use_city_in_lookup = () => sut.WithinSessionAndTransaction(sess =>
    {
        var address = sess.Query<Address >().Single();
        var districts = sess.Query<District >().ToLookup(x => x.City);

        districts[address.City].ShouldNotBeEmpty();
    });
}

In the first test I'm doing the direct comparison of cities, in the second one I'm creating a lookup table with City instance as a key. In both cases lazy loading takes place so I'm working with NHibernate-generated City proxies. But it should not be a problem, as NHibernate guarantees object identity within a single session, right?

Well, uhm, the first test passes as expected, but the second test fails! It turned out that the City instance used as the key in districts (proxy instance) does not maintain identity with the proxy instance fetched for Address instance, even if they are both pointing at the same (single) city and are used within single session!

I'm not sure why this happens and I'm pretty confident it shouldn't, but fortunately the workaround is quite easy. As the proxies instances used in Address and District instances are now different references, they are compared using the Equals method we've provided. When Equals (or, more precisely, GetHashCode) is called on one of the objects to compare it with the second one, lazy fetch from the database is performed and it becomes the "real", unproxied object. But the second one doesn't - it is still CityProxy instance. And Equals offered by ReSharper, when checking objects types, unfortunately expects the type to exactly match:

        if (obj.GetType() != typeof (City))
            return false ;

But obj.GetType() is a CityProxy and we're exiting the comparison with the negative result here. The workaround is just to replace that exact check with more semantic one, checking only whether obj can be treated as a City instance:

        if (!(obj is City))
            return false ;

In this case, nor City neither CityProxy are eliminated, NHibernate can continue to compare city names and see that the proxy points to the same objects. This simple change done - and voilà - we have two tests passed!

Monday, February 18, 2013

[Obsolete] should be obsolete

[Obsolete] attribute is a pretty useful concept in the .NET Framework, designed to mark parts of our code that is planned to be phased out and its usage should be avoided. It makes a lot of sense to mark methods or types as obsolete, e.g. when we drop support for some parts of our API and the code is planned to be removed when all the customers are migrated, or when we've restructurized our code so that there's a newer approach for particular problem and we just had no time or possibility to upgrade all the usages, etc.

Recently, in the project I work in I've stumbled upon a pretty useful class, being part of our project-wide infrastructure and heavily used through the project, marked with an [Obsolete] attribute. Unfortunately, the parameterless overload, without a message, was used. I didn't know that part of code well, so I decided to consult with the author what does it mean for this class to be obsolete. Quick research through the revision history traced back to a developer who is no longer working in the project. Also, noone in the team could point me to an alternative approach I should use, most probably because there was none yet.

I've lost some time trying to figure out what's the reason of that spooky [Obsolete] attribute and I've only came to the conclusion that the developer who put it there must have been planning some redesign or was just unhappy with the class as it is. And even if I'm actually wrong and there is a valid reason for not using the class in question, I had no way to know that, just because someone was too lazy to leave a simple message in the attribute. Without the message, we're just having a pointless compiler warning whenever we're using that class.

To avoid such a confusing situations and a loss of time for our successors, teammates, or even for ourselves, I think we should never define the [Obsolete] attribute using its parameterless constructor. In fact, in my opinion, it should be marked as obsolete itself.

We have such an easy alternative. Wouldn't it be nicer if I had encountered the attribute with a message provided? Like [Obsolete("Use class Foo instead.")] or [Obsolete("Support for LoremIpsum dropped as of v. 3.2")]?

Thursday, February 14, 2013

NHibernate's test suite is a mess

I'm not doing it on purpose, I haven't converted into fighting NHibernate's foe, but it looks it will be another rather not encouraging post about NHibernate. But when I saw it's so called unit tests suite, I couldn't resist on sharing my thoughts.

NHibernate is using CodeBetter's TeamCity. We can see more than 200 tests ignored there. This doesn't bode well itself.

I've looked at NHibernate 3.2 source code I already had on the disk, but I've also compared it a bit with the current GitHub version and the situation haven't changed drastically since then.

Looking at the scope of most of the tests, they are more integration than unit tests - they run a complete top-down calls. This means that it requires a deep knowledge of the system to tell what went wrong when test fails and virtually no chance for anyone not being deep inside the codebase to know how to fix it.

It's nothing wrong with integration testing, of course, but not if the recurring pattern is that they are just a smoke tests - verifying only if there was an exception thrown or not. I've run the tool to find tests without assertions (that I'm going to publish some day) - and it turned out that there are 328 assertionless tests! More than 7% of the whole suite doesn't actually verify any outcome. And it is for sure possible to remove a whole lot of the code, even theoretically test-covered code, and cause no test to fail!

But way more interesting are the tests that do not increase the code coverage even incidentally! I like this one:

[TestFixture]
public class Fixture : BugTestCase
{
    [Test]
    public void Bug()
    {
        // Do nothing, we only want to check the configuration
    }
}

Note also the incredibly pure naming schema. There are 36 tests named Bug and 35 named just Test in the codebase and at least 80 test classes named Fixture. There is some kind of convention for namespaces, though, but with a lot of exceptions.

What else can we see in the NHibernate's "unit" test suite? Well, everything! We have a real file system access, real database access, time measurement, dependency on driver configuration etc. There is a lot of code which should be unit-tested itself.

Those tests, no matter if we name it unit-, integration- or whatever-tests, are poorly factored and contain multiple asserts. Look at the MasterDetail test here - 180 lines of code and 40 asserts in single test! The longest class test I've encountered was FooBarTest - it has 5729 lines (and has pretty nice name, hasn't it?).

And for the end, this is the line I liked the most:

catch(Exception e)
{
    Assert.IsNotNull(e); //getting ride of 'e' is never used compile warning
}

Wow, we have an assertion! But, as the comment suggests, it was added only to shut the compiler up, not to verify anything valuable. Not to mention that there are better ways to cope with compiler warnings...

Anyway, if you'd ever be looking for a good unit test suite to learn something from, at least you know where definitively not to look!

Monday, February 11, 2013

Extension for ordering an Enumerable with custom ranking

There are a lot of scenarios when we want to have the data sequence ordered according to some multi-level ordering rules that solve ties, but in the same time we do want to have the ties visible through the ordinal, ranking position. Let's take a sports league table as an example.

RankTeamPointsGoals lost
1Linq United62
2F.C. Async42
Enumerable Rangers44
4Generics Pitfalls07

In this example the teams are ordered by the highest number of points and in case of ties, from the lowest lost goals count. But the ranking cares only about the first criterion and in case of tie it places all the tied teams on the same position but later restarts numbering as if there were no ties - the ranking doesn't need to grow monotonically.

Recently I was to implement such a ranking and sorting feature with far more complicated, dynamic rules and I found the task far more complex than I initially thought, especially when taking performance into consideration. I wanted to avoid iterating the collection many times, but in the same time I wanted to keep things simple and rely on LINQ's ordering implementation which is very convenient to use even in multi-level sorting (OrderBy > ThenBy > ThenBy).

Before I mention anything about how LINQ sorting works - let me point an important resource in the topic - it's Jon Skeet's Edulinq series in which he reimplements LINQ on his own, with incredibly great insight. Read it all if you haven't already - it's definitely one of the most useful .NET blog posts series ever.

So, LINQ's OrderBy is backed by a pretty standard sort mechanics that uses our criteria to create the single comparer object used through the whole sort algorithm's work. However, my ranking rules are defined as a subset of ordering rules, so the comparer I'd like to use for ranking is most likely different than the sorting one. Given that I'm not going go reimplement sorting on my own, I'll need to iterate the data twice (but the second run will be on the sorted sequence - no issues with non-deterministic sequences - the original sequence will be run once only).

The implementation goes as follows: In the first iteration, just run the standard sorting. In the second run then, prepare the comparer according to the ranking rules, assign the rank 1 to the first element and then compare each pair of adjacent elements from the sequence, deciding whether the second one should get the same rank as the first one or just the next ordinal number (loop counter).

Let's define generic structures that will be used as input and output of our extension method on IEnumerable<T>. SortDefinition<T> is a class that holds the sorting field selector (the same as passed to the "normal" OrderBy/OrderByDescending methods) and a flag that decides about the sort order (I needed to have the ability to define complex and dynamic sorting criteria, it's much easier done with a flag than using LINQ's approach with different methods for ascending and descending sort). There is also Ranked<T> wrapper for our sequence elements type that carries the item together with it's ranking information:

public class Ranked<T>
{
    public Ranked(T item, int rank)
    {
        Item = item;
        Rank = rank;
    }

    public T Item { get; private set; }
    public int Rank { get; private set; }
}

public class SortDefinition<T>
{
    public SortDefinition(Func<T, object> selector, bool isDescending = false)
    {
        Selector = selector;
        IsDescending = isDescending;
    }

    public Func<T, object> Selector { get; private set; }
    public bool IsDescending { get; private set; }
}

Next, here is the public extension method and private method that encapsulates sorting itself. It's using only plain LINQ operators, so we have deferred execution here for free.

public static IEnumerable<Ranked>T>> OrderAndRankBy<T>(
    this IEnumerable<T> source,
    IEnumerable<SortDefinition<T>> sortAndRank,
    IEnumerable<SortDefinition<T>> sortOnly)
{
    if (source == null)
        throw new ArgumentNullException("source");

    var sortAndRankArray = sortAndRank.ToArray();
    var allSorters = sortAndRankArray.Concat(sortOnly).ToArray();
    if (allSorters.Length == 0)
        return CreateRanking(source, EqualityComparer<T>.Default);

    var ordered = SortSequence(source, allSorters);
    return CreateRanking(ordered, new SortersEqualityComparer<T>(sortAndRankArray));
}

private static IOrderedEnumerable<T> SortSequence<T>(IEnumerable<T> source, SortDefinition<T>[] sorters)
{
    var ordered = sorters[0].IsDescending
                    ? source.OrderByDescending(sorters[0].Selector)
                    : source.OrderBy(sorters[0].Selector);

    for (var i = 1; i < sorters.Length; ++i)
    {
        var sorter = sorters[i];
        ordered = sorters[i].IsDescending
            ? ordered.ThenByDescending(sorter.Selector)
            : ordered.ThenBy(sorter.Selector);
    }

    return ordered;
}

And the last part are SortersEqualityComparer and CreateRanking method, both of it are private part of the implementation. CreateRanking needs to enumerate the sorted collection, but thanks to yield keyword we still can have deferred execution, so that we're calculating the rank only when we actually want to read the data and only for as many elements as we requested. Here's the code:

private static IEnumerable<Ranked<T>> CreateRanking<T>(
    IEnumerable<T> ordered, IEqualityComparer<T> comparer)
{
    using (var it = ordered.GetEnumerator())
    {
        if (!it.MoveNext())
            yield break;

        // return first value with rank 1
        var current = it.Current;
        var ranked = new Ranked<T>(current, 1);
        yield return ranked;

        var previous = ranked;
        var nextInOrder = 2;

        while (it.MoveNext())
        {
            // and all the other values with either previous value for equal items or next rank in order
            current = it.Current;
            ranked = new Ranked<T>(current, comparer.Equals(previous.Item, current)
                                                          ? previous.Rank 
                                                          : nextInOrder);
            yield return ranked;

            previous = ranked;
            ++nextInOrder;
        }
    }
}

private class SortersEqualityComparer<T> : EqualityComparer<T>
{
    private readonly SortDefinition<T>[] _sorters;

    public SortersEqualityComparer(IEnumerable<SortDefinition<T>> sorters)
    {
        _sorters = sorters.ToArray();
    }

    public override bool Equals(T x, T y)
    {
        return _sorters.All(s => s.Selector(x).Equals(s.Selector(y)));
    }

    public override int GetHashCode(T obj)
    {
        return obj == null ? 0 : obj.GetHashCode();
    }
}

Here is the usage example:

var results = new[]
{
    new TeamResult() { Name = "Enumerable Rangers", Points = 4, GoalsLost = 4 },
    new TeamResult() { Name = "F.C. Async", Points = 4, GoalsLost = 2 },
    new TeamResult() { Name = "Generic Pitfalls", Points = 0, GoalsLost = 7 },
    new TeamResult() { Name = "Linq United", Points = 6, GoalsLost = 2 }
};

var rankAndSort = new[] { new SortDefinition<TeamResult>(x => x.Points, isDescending: true) };
var sortOnly = new[] { new SortDefinition<TeamResult>(x => x.GoalsLost) };

var orderedTeams = results.OrderAndRankBy(rankAndSort, sortOnly);

foreach (var team in orderedTeams)
{
    Console.WriteLine("{0}. {1} - {2} points, {3} goals lost", 
        team.Rank, team.Item.Name, team.Item.Points, team.Item.GoalsLost);
}

And on the console we can see:

1. Linq United - 6 points, 2 goals lost
2. F.C. Async - 4 points, 2 goals lost
2. Enumerable Rangers - 4 points, 4 goals lost
4. Generic Pitfalls - 0 points, 7 goals lost

Full code is available as a Gist - feel free to use it!

Monday, January 28, 2013

Top 5 Pragmatic TDD mistakes

Today, on Telerik's blog, Bradley Braithwaite published his post "Top 5 TDD Mistakes". I like to comment on "top N mistakes"-kind of posts, so read the Bradley's post first and let me add my three cents about this one, too.

I'm pretty satisfied to read someone who is passionate and pragmatic about Test-Driven Development in the same time. Let's just quote two important sentences from Bradley's article I agree with totally:

  1. It's better to have no unit tests than to have unit tests done badly.
  2. Unit Tests make me more productive and my code more reliable.

Let me comment on each of the mistake mentioned in the article.

Not using a Mocking Framework vs. Too Much Test Setup

One may say that these two are a bit contradictory as using a mocking framework may lead to too much test setup. This is of course a matter of good balance we need to find and maintain.

If we're testing a code that has some dependencies and not using mocking framework, we're not doing unit testing in its pure form, for sure. But let's take my favourite example of testing the code that talks with the database - is there really anyting better than testing against the real database, run in-memory? Call it integration test, half-arsed test or whatever, that's the only reliable method of testing the queries itself.

But in most cases, sure, let's use stubs, but avoid mocks - the difference is very important. Mocks as a behaviour testing tool are often overused to check if the implementation is exactly as defined in the test, written by the same person in the same time - it's like checking if the design we've just intentionally created was really intentional. Let's focus on verifying the actual outcome of the method (in which stubs may help a lot) instead of coupling to the implementation details.

And, agreed, there's nothing worse to maintain than a test suite with those spaghetti-flavoured mock setups.

Asserting Too Many Elements

Agreed. There's a good rule of having single logical assert per test, even if it physically means several Assert calls. And for me having any asserts at all is a rule of equal importance.

Writing Tests Retrospectively

Agreed again, the most of TDD's gain is from having to think about the implementation before it comes to existence. It can make developer think not only about happy paths, but also about negative cases and boundary values. It also strongly supports KISS and YAGNI rules which are vital for long-living codebases.

Also, I like using TDD to cope with reported bugs. Writing failing unit (or - again - probably more integration-like) test by recreating the failure's conditions makes analysis easier, helps isolating the root cause of the failure and is often much easier than reproducing the bug in real-life scenario. Retrospectively written test for a bug is only for regression testing.

Testing Too Much Code

Not an universal truth, but I feel quite the same. I see a lot of value in unit-testing my workhorse parts of code, I do create it more-less according to the TDD principles (especially having no production code without failing test).

But I don't think having 100% code coverage as an ultimate goal makes any sense. I think there's always quite a lot of code that is just not suitable for valuable unit testing, i.e. coordinating/organizing kind of code (let's call it composition nodes as a reference to composition root), that just takes some dependencies, calls several methods and pass things from here to there, not adding any logic and not really interfering with the data. Tests written for that kind of code are often much more complicated than the code itself, due to heavy mocks and stubs utilization. Bradley's rule of thumb is what works for me, too: write a separate test for every IF, And, Or, Case, For and While condition within a method. I'd consider the code to be fully covered when just all that branch/conditional statements are covered.

Wednesday, January 23, 2013

Does it make sense to AssertWasNotCalled?

In the recent post about the importance of assertions in the unit tests I've mentioned that we should not test our code to prove what it's not doing and focus on testing what it is actually doing. Regarding that point of view I was asked whether I find it useful to use AssertWasNotCalled or similiar assertion methods provided by the mocking frameworks, as generally they are to ensure no work was done and an empty method is passing that kind of tests. This is an interesting question, let's analyze it.

I don't blindly believe in TDD and all its rules in every scenario, but let's stick to that context. As an example, let's write and test a Work method which is taking a boolean parameter to decide whether the actual work should be done.

[Test]
public void Should_do_no_work_when_false_given()
{
    // arrange
    var sut = new Service(workerFake);

    // act
    sut.Work(false);

    // assert
    workerFake.AssertWasNotCalled(x => x.DoWork());
}

I've discouraged testing what the code is not doing in the context of the third test-driven development rule, which says "you may not write more production code than is sufficient to pass the current failing test". According to that rule, for the test above, we may only end up with an empty method as an implementation - it will be just enough for the test to pass.

But earlier, there is the first rule of TDD - "You may not write production code until you have written a failing unit test". We've never saw the test failing.

Let's than approach the problem differently - starting from the "positive" path, where we can actually verify any outcome, instead of verifying the lack of it.

[Test]
public void Should_do_the_work_when_true_given()
{
    // arrange
    var sut = new Service(workerFake);

    // act
    sut.Work(true);

    // assert
    workerFake.AssertWasCalled(x => x.DoWork());
}

We can easily see this test failing for the empty Work method. Let's then implement it according to TDD rule number three (no single line of code not enforced by the tests).

public void Work(bool really)
{
     worker.DoWork();
}

The test is now passing, rules are kept. We may now add back our "negative" test case - verifying no work is done when false is passed. This time the test will fail, rule number one is satisfied. We may now legally (according to the TDD) add the condition and finish our pretty sophisticated implementation:

public void Work(bool really)
{
     if (really)
         worker.DoWork();
}

The conclusion from this oversimplified example is that the AssertWasNotCalled test alone probably makes a little sense. But when considered as a part of the larger test suite for the method, and when implemented in correct order (outcomes verified first, lack of outcome last), it may even be the only TDD-compliant way of testing the intended behaviour.

The same rule applies to the validation method example I've mentioned in the previous post. Even if I'm against the unit tests that just expect no exception being thrown, if throwing an exception when needed is the method's single responsibility, there's no other way to test the "positive" (not throwing) behaviour in the unit tests other than verifying no exception was thrown.

As a side note - AssertWasNotCalled assertion method can actually be rewritten (in Rhino Mocks syntax) into:

workerFake.Stub(x => x.DoWork()).Throws(new Exception());

And when the production code actually calls the method it should not, the exception is thrown and the test fails. So the distinction between the test I consider assertionless and the valid one can be a bit blurry. But my personal opinion I'll stick with is to always have the assertions as explicit as possible, even if this is just a syntactic sugar or no-op like AssertDoesNotThrow. This allows us to express the intent of the test clearly, making it readable and maintainable. Smoke testing approach - "if it throws, it will fail anyway" doesn't.

Image borrowed from http://www.yesodweb.com.

Monday, January 21, 2013

Assertion is the key of unit testing

We all probably know about the importance of good unit test suite for our projects and about the Arrange-Act-Assert pattern we should be using in our tests. I think I'm lot more pragmatist than purist and I don't have anything against the tests not following the AAA form as long as I can read, understand and maintain them easily. But there's one thing I do consider a must in each and every unit test - an assertion.

Assertion not only decides whether the test passes or fails, it also makes the purpose of the test clear and ensures it is verifiable. Test without assertion is nothing more than a random snippet of code.

What I consider to be the assertion?

  • A direct call to a known asserting method, provided most often by the testing frameworks, assert or mocking libraries, etc., like the classic Assert.Equal(actual, expected) or Shouldy's extension method like actual.ShouldBe(expected). The naming convention of these constructs always makes it clear that they actually verify the outcome - "assert that...", "verify that...", "something should be...".
  • An indirect call to a method described above. When verification of the outcome is not just a comparison of a single object (or the comparison we need is not included in our favourite testing framework), we often create our own helper methods that play a role of a single logical assertion, despite multiple physical ones, i.e.
    private void ShouldBeAValidExternalUri(this string value)
    {
        value.StartsWith("http://our.domain.com/").ShouldBeFalse();
        Uri.IsWellFormedUriString(value, UriKind.Absolute).ShouldBeTrue();
    }
    That's still a great assertion, call to ShouldBeAValidExternalUri will eventually trigger a call to the known asserting method provided by the framework. It's nice if we maintain the naming schema that indicates we're going to verify something in our helper method.
  • Throwing an exception within the test or within the helper asserting method. It's technically the equivalent of calling Assert.Fail. It also provides a good way of describing what went wrong using the exception message as it will probably end up shown in the test runner.

What I do not consider to be the assertion?

Throwing an exception in the SUT itself! It is quite an often thing to see in some projects - tests that call a method on the tested object with an assumption that the tested method will throw if something goes wrong.

public class Sut
{
    public static void DoTheWork(int howManyTimes)
    {
        if (howManyTimes <= 0)
            throw new ArgumentOutOfRangeException("howManyTimes");

        for (var i = 0; i < howManyTimes; ++i)
            DoTheRealWork();
    }

    // ...
}

public class Test
{
    [Test]
    [ExpectedException(typeof(ArgumentOutOfRangeException))]
    public void SutThrowsWhenNegativeCounterPassedIn()
    {
        Sut.DoTheWork(-1);
    }

    [Test]
    public void SutDoesntThrowWhenPositiveCounterPassedIn()
    {
        Sut.DoTheWork(1);
    }
}

The Sut class is a perfectly valid piece of defensive code. And we have two tests, one checking if the exception is thrown when negative number is passed, and the second...? What is the second testing? Technically it works - if an exception is thrown in the tested code, the test would fail. If not - it would pass, proving that no exception was thrown.

But what is the value given by this test? It is not verifying whether the work was actually done. What more, according to the TDD rule that we should have only the minimal amount of code making the test pass, the correct implementation of the DoTheWork method required by SutDoesntThrowWhenPositiveCounterPassedIn test is a no-op! Both tests will still pass if we remove the whole loop. So was there anything really tested?

A good test should always make it clear and explicit what is tested and what is the expected outcome. "Expect no exception" approach doesn't follow that rules. "No exception" is not a good outcome if an empty method is enough for it to pass. And proving that your code doesn't do something is much more difficult than proving what it does.

Of course, as always, there are some cases when it's maybe acceptable to have tests without explicit assertion. Tests for validation methods that exist only to throw when certain condition is not met may be a good example. Another example is smoke testing, as a part of integration test suite. When we have the functionality well-covered with lower-level unit test suite and we just want to ensure that the code doesn't blow up when run together and verifying the outcome at this level is hard enough, we may go with "expect no exception" test.

But even in those cases I personally would use some kind of explicit construct to show the intent of the test. For example, NUnit offers assertion like this:

Assert.DoesNoThrow(() => Validator.Validate("valid string"));

It's not perfect, but at least more explicit than the sole SUT method call.

Tuesday, January 8, 2013

9th most common mistake - using "always" and "never"

The article I've found today in The Morning Brew is the great occasion to break the silence here on my blog. When I read an article that sounds like a definitive source of truth, I don't expect to read so subjective, single point of view arguments as I've found in Paweł Bejger's post "8 Most common mistakes C# developers make". I just couldn't resist to comment on it.

Paweł mixed some .NET Framework tips with performance advices and some architectural guidelines together. Let's briefly discuss his points.

1. String concatenation instead of StringBuilder

Advice here is to use StringBuilder as it doesn't allocate memory for each string part. But it adds its own overhead, it's often an overkill and actually may be a performance issue itself when concatenating few strings only. If StringBuilder was the only proper way to concatenate strings, the framework wouldn't expose such handy + operator for sure.

2. LINQ – ‘Where’ with ‘First’ instead of FirstOrDefault

Two different things were mentioned in this topic. The first one is that instead of chaining Where(x => sth) and First(), we should use First(x => sth) instead. Well, yes, but it's nothing more than one less method call. The number of elements actually fetched from the collection is equal to one in both cases, no performance difference. I would say this is up to personal preferences.

The second is that FirstOrDefault is better than First as it doesn't make an assumption that the collection is not empty. But I often do want to have that assumption. Otherwise, I need to be sure that my code handles null well. In many cases, when I'm sure that the collection is not empty, I just don't need that null handling code. And yes, First will throw for empty collections. I still find it better than some mighty NullReferenceException that may be thrown few lines later in case FirstOrDefault is used without proper attention.

3. Casting by means of ‘(T)’ instead of ‘as (T)’ when possibly not castable

Agreed. But again, not so big difference practically. NullReferenceException thrown unexpectedly later due to unproper handling of null returned by as operator is not a bit better than InvalidCastException thrown directly while casting. Anyway, the need for type casting itself can often indicate a code smell.

4. Not using mapping for rewriting properties

Paweł suggests here that we should use object-to-object mapping libraries like AutoMapper for rewriting properties from one object to another. Agreed, but only when we're doing A LOT of stuff like this, i.e. mapping between layers. When we just need to have several properties mapped, the cost of using the external library will never pay off. Moreover, in those cases it sounds like over-engineering and abuse of both KISS and YAGNI principles. And especially when some additional logic needs to be performed during the mapping, I would personally suggest being as explicit as possible, for readability and maintainability reasons.

5. Incorrect exceptions re-throwing

If we need to re-throw exceptions, we should be using throw alone to preserve the stack trace, agreed. Just a nitpick here - the example used in the article shows more harmful anti-pattern - catching general Exception itself. Wonder how we could ever handle OutOfMemoryException.

6. Not using ‘using’ for objects disposal

Well, can't disagree. Developers should know how to use the tools they were given.

7. Using ‘foreach’ instead of ‘for’ for anything else than collections

Don't disagree here, again, but I haven't really checked that, as for me this really sounds like a micro-optimization not needed in the vast majority of cases. And premature optimization, without known bottlenecks and metrics, even if not the root of all evil, is rather not a time well-spent.

8. Retrieving or saving data to DB in more than 1 call

Similiar to the one above, but on the different level. There's nothing wrong in reading data in two queries if our life becomes much easier then. Modern databases and ORMs offer features like multi-queries or batches (sending multiple commands to the database together to reduce round trips), so the optimization can be moved at infrastructural level, instead of unnecessarily complicating our query.

However, it's almost always wrong to have N+1 queries, but that's the separate story.

Conclusion

I do not negate that C# developers should know all the things mentioned in Paweł's article, but being aware of how things work, what are the pros and cons of each approach is something different than just blindly applying the rules like "don't do this, do that". I'm missing those pieces of advice in the original article.

There's no such thing as the single right answer for any question in our craft. There are always many options to consider and we can never be sure that something is always wrong or always right not knowing the whole context. Even classical design patterns proposed by the Gang of Four, always considered as the best practise we should not discussed with, were recently reasessed quite well by Ayende.

The answer to life, the universe and everything in programming is just always "it depends".