Saturday, July 23, 2011

Striking discoveries of the last couple of months.

Ok, those are old news, but it appears that I’m not able to learn from other people’s mistakes.

1. Leaky abstractions make you look stupid. If you are heavily using of EF's lazy loading, don't pretend that you'll be able to abstract EF away.

2. Every abstraction should solve some problem (current or foreseeable). Be able to envision use cases for every extension point you created.

3. Queries should be made in the simplest way possible. Use plain SQL, Document DBs, XML, whatever. No need for your fancy Repository pattern here.

4. Validation is about ensuring that input is plausible. It is performed with no context in mind by the Command object (or something similar) itself. Decision about whether this input will actually hit database is made a) after all preliminary checks have passed and b) after verifying the input against the most up-to-date data with the most restrictive transaction isolation level possible (think singleton double-checking)

5. The data you are showing on UI is already stale. Other users have already updated the storage. Don't freak out about this. Allow inconsistencies. Know the minimum time that's required for changes to actually appear on UI. Approach this time by scaling-out reads.

6. CQRS examples are everywhere: RavenDB with it's background indexes calculation, OLAP solutions with their data marts, etc.

7. Scale-in is a perfectly valid option for the transactional storage that you'll use for writing.

8. DDD is mostly a way you communicate with product owner.

I totally recommend CQRS introduction by Udi Dahan – this is a real eye-opener.

Monday, January 31, 2011

Brief introduction to Dependency Injection (with MEF samples)

I've written this post with the sole intention of creating a reference that I can use during conversations on the topic. I'm by no means an expert and I will hardly enlighten a developer with reasonable design experience. Anyway, here's a list of facts and definitions that can be used to introduce the concept to freshman. I will shamelessly borrow various parts of it from different people, who are more successful at covering the topic so be sure to follow these links if you are interested in more detailed explanations.

What is Dependency Injection?

Put as simply as possible, dependency injection is an object-oriented technique that, when applied, results in design where every component requires it's dependencies to be passed from the outside. The component doesn’t care about their origin, the only thing that it absolutely demands (and, given the C# type system, this is pretty much guaranteed by the compiler) is that this type fulfills the specified contract.

Code Snippet
  1. public class BusinessService
  2. {
  3.     private ILogger _logger;
  4.     private IOrderProcessor _orderProcessor;
  5.     private IOrderRepository _orderRepository;
  6.         
  7.     public BusinessService( ILogger logger,
  8.                             IOrderProcessor orderProcessor,
  9.                             IOrderRepository orderRepository)
  10.     {
  11.         _logger = logger;
  12.         _orderProcessor = orderProcessor;
  13.         _orderRepository = orderRepository;
  14.     }
  15.  
  16.     //Some useful operations  here
  17. }

See how we require the creating code to pass (or inject) various dependencies used by service to successfully provide its business function. Obviously, the class is written against interfaces which indicates author’s intention to make it work with whatever implementation is given at runtime.

What problems does it solve?

Well, everyone who has ever written anything more complicated than a calculator sample (and many of those who stopped at that level) have noticed how fast things tend to turn into uncontrollable mess. You hear this all the time: facilitate code reuse, provide clear abstractions that hide internal complexities behind a simple interface,  make your code readable, communicate your ideas in natural terms and so on. In other words, it is much easier to deal with separate modules that serve narrow focused purpose. This explains why we need interfaces and classes, inheritance and encapsulation but what’s the point of delegating these dependencies’ instantiation from the class that will actually use them? The answer is: ability to replace implementations. Indeed, if all your classes receive loggers this way, you only have to specify exact ILogger instance once at the entry point of the program, it will simply propagate down to the object graph leafs. Replacing this implementation becomes a piece of cake, and if you can easily replace any part of your complicated system like this, you enable yourself to test any part of it in isolation. We come to an interesting topic of mocking, that is brilliantly introduced here (by Karl Seguin).

How do I get there from my messy code?

What I like about this approach is the fact is that refactoring for DI almost always produces nicer code. Regardless of your design skills, extracting functionality into dependencies, defining interfaces and making them play nicely together improves your code. Writing unit tests reveals interfaces’  imperfections and poor abstractions but even without tests and with little domain knowledge you end up with smaller and more focused classes.

I’m not claiming that using DI is necessary to organize your code this way. After all, these simple ideas were described long ago but sometimes you need a mechanistic rule like this to enforce good practices.

Do I need to handle dependencies manually?  I’ve heard there are things named DI Containers.

Too many DI introductions begin with showing a tool before explaining its purpose. DI Container is a framework that does one simple thing: it enables you to create a registry that maps your interface types to their concrete implementations. It also provides an interface to construct dependencies tree starting from its root. Here’s a basic MEF sample.

Code Snippet
  1. class Program
  2. {
  3.     static void Main(string[] args)
  4.     {
  5.         TypeCatalog typeCatalog = new TypeCatalog(  typeof(Logger),
  6.                                                     typeof(OrderProcessor),
  7.                                                     typeof(OrderRepository),
  8.                                                     typeof(BusinessService));
  9.  
  10.         CompositionContainer container = new CompositionContainer(typeCatalog);
  11.         BusinessService service = container.GetExportedValue<BusinessService>();
  12.     }
  13. }
  14.  
  15. [Export(typeof(ILogger))]
  16. public class Logger : ILogger
  17. {
  18.     //...
  19. }
  20.  
  21. [Export(typeof(IOrderProcessor))]
  22. public class OrderProcessor : IOrderProcessor
  23. {
  24.     //...
  25. }
  26.  
  27. [Export(typeof(IOrderRepository))]
  28. public class OrderRepository : IOrderRepository
  29. {
  30.     //...
  31. }
  32.  
  33.  
  34. [Export(typeof(BusinessService))]
  35. public class BusinessService
  36. {
  37.     //...
  38.     private ILogger _logger;
  39.     private IOrderProcessor _orderProcessor;
  40.     private IOrderRepository _orderRepository;
  41.         
  42.     [ImportingConstructor]
  43.     public BusinessService(ILogger logger,
  44.                             IOrderProcessor orderProcessor,
  45.                             IOrderRepository orderRepository)
  46.     {
  47.         _logger = logger;
  48.         _orderProcessor = orderProcessor;
  49.         _orderRepository = orderRepository;
  50.     }
  51.  
  52.     //Some useful operations  here
  53. }

Notice that we register our BusinessService as an Export against it’s own type, not interface. MEF allows this kind of  thing, although using interfaces is still preferable.

How do I choose proper DI framework? This seems like an important decision that can not be reverted later.

No, it’s not that terribly important (although I like MEF attributed model and parts discovery services). Thing is, a properly designed application will have few (or only one) call to the composition container. ‘Properly designed’ in the previous sentence means ‘follows Hollywood Principle’ – don’t call container, let it call you instead.

Are there any disadvantages?

Of course, every abstraction comes at a price. When you read you program, you can’t just ‘go to definition’ of the dependency and see the actual code that will be executed. There is also a performance overhead, associated with interface resolution and reflection. Think twice before injecting many objects with short lifetime (consider using Abstract Factory instead).

What else shouldn’t I inject?

It hardly makes sense to inject DTOs (or data bags) that don’t contain any logic. I’d also continue newing various helper objects that don’t require any dependencies themselves and are tightly coupled with the class that calls them (why is this a separate object, then?).

Wednesday, November 24, 2010

ORM, Unit of Work and desktop applications

ORMs have become recommended way of talking to relational DBs. Yes, higly scalable server applications may use NoSQL solutions and key-value stores, sample apps and coursework will rely on inline SQL and manual data mapping but for vast majority of us, who write textboxes-over-DB apps, ORM is the way to go.

Using ORM in a  data-driven desktop app is simpler than dealing with all abstraction layers and deployment tiers that you’ll have in distributed scenario, but it raises certain typical problems.

Unit of Work lifetime.

EntityFramework’s ObjectContext and NHibernate’s Session are implementations of the Unit of Work pattern. Combined with internal Identity Map they represent a low-level facade on top of your database that abstracts you away from the actual SQL and tables.

First thing you’ll need to decide is when you are going to create your Session (or DataContext , in Entity Framework case). Lifetime of these objects determines the scope of your business transaction.

Why is it important? Make your session object a process-wide singleton – and you’ll end up caching your entire database in memory. Create it for every DB call – and you’ll lose caching and change tracking. New identity map every time is no identity map.

This topic is brilliantly covered by Ayende in his article: Building a Desktop To-Do Application with NHibernate. I will not reproduce his thoughts here, but the summary is: the best way to do this is is to create single session per form.

Repositories and UnitOfWork

Repository pattern is another convenient abstraction that, basically,

  • completely hides persistence mechanisms from the caller by exposing CRUD operations,
  • provides a natural querying API expressed with domain terms.

Every repository operation may or may not be part of your business transaction, which means that repository should receive UnitOfWork from the outside.

The Problem

All objects within a view should be retrieved using the same ObjectContext. Make using repositories easy. Don’t expose persistence details to domain and UI code. Make everything nice and testable.

Code and DI container wireup.

This is the central interface. It knows nothing about our ORM or database of choice.

   interface IUnitOfWork
   {
       TRepository Create<TRepository>();
       void Commit();
       void Rollback();
   }

What I’m doing here is, strictly speaking, a violation of SRP because this interface both manages the business transaction and serves as a service locator for repositories.

Here’s the implementation that uses EntityFramework. We will create this object for every view.

  public class UnitOfWork : IUnitOfWork
  {
      private FPEntities _context;
      private IUnityContainer _container = Container.CreateChildContainer();

      public UnitOfWork()
      {
      }

      private void InitContext()
      {
          _context = new FPEntities();
          _container.RegisterInstance(_context.GetType(), _context);
      }

      #region IUnitOfWork Members

      public TRepository Create<TRepository>()
      {
          if (_context == null)
              InitContext();

          return _container.Resolve<TRepository>();
      }

      public void Commit()
      {
          _context.SaveChanges();
      }

      public void Rollback()
      {
          //http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/33e259c7-3cdc-4ec8-b135-085e981df1f0

          foreach (var entry in _context.ObjectStateManager.GetObjectStateEntries(EntityState.Added))
          {
              if (entry.Entity != null)
                  _context.DeleteObject(entry.Entity);
          }
          
          foreach (var entry in _context.ObjectStateManager.GetObjectStateEntries(EntityState.Modified))
          {
              if (entry.Entity != null)
                  _context.Refresh(System.Data.Objects.RefreshMode.StoreWins, entry.Entity);
          }
          
          foreach (var entry in _context.ObjectStateManager.GetObjectStateEntries(EntityState.Deleted))
          {
              if (entry.Entity != null)
                  _context.Refresh(System.Data.Objects.RefreshMode.StoreWins, entry.Entity);
          }

          _context.AcceptAllChanges();
      }

      #endregion
  }

If you’ll look carefully at our implementation IUnitOfWork, you will notice that it works as a service locator for repositories. It is quite common to consider it an anti-pattern because of its not-so-strict interface, but I’ll stick with it for now.

When a ViewModel is intialized, it’s passed a UnitOfWork instance. This instance will be used to provide repositories that use same ObjectContext.

Sample code can be found here.

Thursday, October 14, 2010

Reducing memory footprint with string instances caching

.NET uses strings interning to lower the memory pressure by creating only one single object instance for every literal in your code. These strings will sit in the Large Objects Heap for the entire lifetime of the CLR instance. One can intern a string manually by calling the corresponding method, which may be a good idea in some situations, but each case should be carefully considered because, as mentioned above, interned strings won’t be garbage collected.

In this post I will show simple, yet frequently overlooked, technique that helps to share string instances without fear of introducing a memory leak. I will build my sample around this trivial class:

    class StringPool
    {
        private Dictionary<string, string> _pool = new Dictionary<string, string>();

        public string GetString(string s)
        {
            string result;
            if (!_pool.TryGetValue(s, out result))
                _pool.Add(s, result = s);

            return result;
        }
    }

Here, GetString always tries to return the cached instance without holding any references to the request string.

Let’s try to evaluate the memory consumption by running the test:

   class Program
   {
       private static List<string> _strings = new List<string>();
       private const int _stringsNum = 1000000;
       private static StringPool _pool = new StringPool();

       static void Main(string[] args)
       {
           GenerateStrings(false);
           GenerateStrings(false);

           Console.WriteLine(GC.GetTotalMemory(true));
           Console.ReadLine();
           
           _strings = new List<string>();
           GenerateStrings(true);
           GenerateStrings(true);
           _pool = null;

           Console.WriteLine(GC.GetTotalMemory(true));
           Console.ReadLine();
       }

       private static void GenerateStrings(bool usePool)
       {
           for (int i = 0; i < _stringsNum; i++)
           {
               string s = new StringBuilder("Foo").Append(i.ToString())
                   .ToString();

               if (usePool)
                   _strings.Add(_pool.GetString(s));
               else
                   _strings.Add(s);
           }
       }
   }

Which outputs:

image

Pretty impressive for such a simple trick.

Any real-world samples?

Imagine you have a DataReader enumerating  through a huge data set to populate a list of personal data objects. There is very high chance that among thousands of these people, you’ll find hundreds repeating names, geographical locations and job titles. By passing every string to the StringPool you’ll eliminate all duplicate instances.

Monday, September 27, 2010

Applications for generic type parameter variance

Co- and contravariance in C# 4.0 introduce new point of view on generics. We can now consider them mathematical operators, that map ordered set of types into some new type.  On the other hand, it’s a major syntactic change because it redefines how we think about what “assignable” means when talking about interfaces and delegates.

Here are widely known examples:

1. Type covariance allows us to do this:
    IEnumerable<Base> baseCollection;
    IEnumerable<Derived> derivedList = new List<Derived>();
    baseCollection = derivedList;
2. Type contravariance allows the opposite:
    Action<Derived> func = new Action<Base>((b) => { ;});
3.  Applying both makes even the weirdest things a reality:
       static void Main(string[] args)
       {
           Func<Derived, Base> func = new Func<Base, Derived>(Foo);
       }

       public static Derived Foo(Base parameter)
       {
           return null;
       }

I have to admit, within the last six months that passed since the release of .NET 4.0, I haven’t seen any good examples outside the .NET BCL of how this can be used to build more clear API. Honestly, I can’t think of any good examples either. This is especially sad because I like to think of myself as a guy with a bit of mathematical background, which should have made the mindset switch a little easier.

So what makes this language feature so hard to use?

For me, it was the realization that marking generic parameter as in or out is not enough. You have to design an interface in certain way and that’s one extra thing to remember. Eric Lippert has written great post that explains the rules comprehensively.

Another thing is that, apparently, covariance or contravariance is not something that you’re likely to use in your domain code. They belong to situations when this operator paradigm is natural, like converting Item type into IEnumerable<Item>. So, if CLR types themselves are not your domain – you are less likely to find an application for it.

P.S. Take a look at this humorous example of using generic parameters variance. Pay attention to the comments :)

Tuesday, September 21, 2010

Don’t store binaries in your repository

A lot has been said about storing binaries inside project's repository. Does this mean I can't share my frustration (caused by my recent experience)? Definitely, not.

Don't to do this!

That's it, I'm done with the emotional part.

I'm not talking here about NHibernate, or NUnit or fancy P&P tools you downloaded from the Codeplex to impress girls. Those are solid and (usually) well-tested libraries that don't change frequently. And even when they do, you are not forced to adopt the newest version.

I'm primarily talking about so-named "frameworks" and "integration middleware" your fellow co-workers next door, right behind the water cooler, are developing .  Regardless of how qualified they are, deadlines and ever-changing requirements will force them to push (and force you to use) updates at random moments.  In other words, their code is as unpredictable and crappy as yours.

So you'd better be prepared and react right after these changes are checked in.

Few obvious points about having dlls in your repository:

- You can’t get any meaningful change history from your “Binaries” folder, especially if you don’t bother writing good check-in comments.

- You can’t merge automatically.

- It’s impossible to increment in small steps, you have to check in whole dll.

- You will multiply your problems if you reference other project that also uses your binary dependencies.

What arguments will be probably used to sneak THEIR libs into your precious repository:

We don't want our changes to break your code. Just grab a fresh version whenever we consider it “non-breaking” enough.

Well, if you are creating a framework, you must not introduce any breaking changes. If you just create remote proxies for your corporate distributed system and interface changes in a non-backwards compatible  manner, how will not breaking the build help you?

We have multiple consumers within the organization. We cannot have them all in our solution.

You don't have to. Consumers will have to refer to your repository using something similar to SVN Externals. You will get notified by CI server when you break someone's build.

But this way others will be able to access our code!

Give read-only access to everyone who is involved within the organization, you won't be able to keep it secret anyway (at least not with .NET).

We will use separate branches to give each consumer it’s special version and give it as set of dlls. You don’t have to bother, we’ll take care of everything.

You probably don’t know what you’re diving into. Even leaving aside the Open-Closed Principle, which strongly discourages that style, are you really going to maintain these branches for the whole lifetime of their corresponding projects?

Thursday, May 27, 2010

Unable to start debugging on the web server

"Unable to start debugging on the web server". I had this error when I tried to debug ASP.net application on IIS 7.0. Google brings a lot of links including links to Microsoft support site, but none of the solutions worked for me. Suddenly I realised that I had changed the password for application pool user account, which resulted in worker process silently refusing to start. Hope this saves someone a few minutes.