Going Deep Enough with Microservices

Moving from a monolith architecture to microservices is a widely debated process, with many recommendations and nuggets of advice available on the web in blogs like this. There are so many different opinions out there mainly because where an enterprise finds their main complexities lay depends on the skillsets of their technologists, the domain knowledge within the business and the existing code base. During the years I’ve spent as a contractor in a very wide range of enterprises, I’ve seen lots of monolith architectures – all of them causing slightly different headaches because those responsible for developing them let different aspects of the architecture slip. After all, the thing that is often forgotten is that if a monolith is maintained well, then it can work. The reverse is also true – if a microservice architecture is left to evolve on its own, it can cause as many problems as a poorly maintained monolith.

Domains

One popular way to break things down is using Domain Driven Design. Two books which cover most concepts involved in this process are ‘Building Microservices’ by Sam Newton (http://shop.oreilly.com/product/0636920033158.do) and ‘Implementing Domain Driven Design’ by Vaughn Vernon (http://www.amazon.com/Implementing-Domain-Driven-Design-Vaughn-Vernon/dp/0321834577) which largely references ‘Domain Driven Design: Tackling Complexity in the Heart of Software’ by Eric Evans (http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215). I recommend Vaughn’s book over Evans’ as the latter is a little dry.

If you take on board even just half the content covered in these books, you’ll be on a reasonable footing to get started. You’ll make mistakes but as Sam Newton points out (and I’ve seen for myself) that’s inevitable.

Something that seems to be left out of a lot of domain driven discussions is what happens beyond the basic CRUD processes and domain logic in the application layer. Attention sits primarily with the thin interaction between a web interface and the domain processing by the aggregate in question. When dismantelling a monolith architecture into microservices, focus on just the application layer can give the impression of fast progress but in reality half the picture is missing. It’s likely that in a few months there will be several microservices but instead of them operating solely in their sub-domains, they’ll still be tied to the database that the original monolith was using.

Context

It’s hugely important to pull the domain data out of the monolith store. This is for the very same reasons we segregate service responsibilities into sub-domains. Data pertaining to a given domain may exist in other domains as well but changes will not necessarily be subjected to the same domain rules and individual records may have different properties. There may be a User record in several sub-domains, each with a Username property but the logic around how duplicate Usernames are prevented should sit firmly in a single sub-domain. If a service in a different sub-domain needs to update the username, it should either call a public service from the Profile sub-domain or raise a ‘Username Updated’ event that the Profile sub-domain would handle, process and possibly respond with a ‘Username Update Failed’ event of its own.

This example may be a little contrived – checking for duplicates could be something that’s implemented everywhere it’s needed. But consider what would happen if it became necessary to check for duplicates within another external system every time a Username is updated. That logic could easily be encapsulated behind the call to the Profile service but having to update every service that updates Usernames wouldn’t be good practice.

So if we are now happy that the same data represented in different sub-domains could at any one time be different (given the previous two paragraphs) then we shouldn’t store the data for both sub-domains in the same table.

Local Data

In fact, we’re now pretty well removed from needing a classic relational database for storing data that’s local to the sub-domain. We’re dealing with data that is limited in scope and is intended for use solely by the microservices built to sit in that sub-domain. NoSQL databases are ideal for this scenario and no matter which platform you’ve chosen to build on there are excellent options available. One piece of advice I think is pretty sound is that if you are working in the cloud, you’ll usually get the best performance by using the data services provided by your cloud provider. Make sure you do your homework, though – some have idiosyncracies that can impact performance if you don’t know about them.

So now we have data stored locally to the sub-domain, but this isn’t where the work stops. It’s likely there’s a team of DBA’s jumping around wondering why their data warehouse isn’t getting any new data.

The problem is that the relational database backing the monolith wasn’t just acting as a data-store for the application. There were processes feeding other data-stores for things like customer reporting, machine learning platforms and BI warehouses. In fact, anything that requires a historical view of things will be reading it from one or more stores that are loaded incrementally from the monolith’s relational database. Now data is being stored in a manner best suiting any given sub-domain, there isn’t a central source for that data to be pulled from into these downstream stores.

Shift of Responsibility

Try asking a team of dba’s if they fancy writing CLR based stored procedures to detect changes and pull new records into their warehouse by querying whatever data-store technologies have been decided on in each case – I doubt they’ll be too receptive. The responsibility for getting data out of each local data-store now has to move closer to the application services.

The data guys are interested in recording historical and aggregated records, which is convenient as there is a useful well known tool for informing different systems that something has happened – an event.

It’s been argued that using events to communicate across sub-domains is miss-using an event stream as a message bus. My argument in this case is that the back-end historical data-store is still within the original sub-domain. The data being stored belongs specifically to that sub-domain and still holds the same context as when it was saved. There has been a transition to a new medium of storage but that’s all.

So we are now free to raise events from our application microservices into eventstreams which are then handled by a service specifically designed to transfer data from events into whatever downstream stores were originally being fed from the monolith database. This gives us full extraction from the monolithic architecture and breaks the sub-domain’s dependency on the monolith database.

There is also the possibility that we can now give more fine grained detail of changes than was being recorded previously.

Gaps in the Monolith Database

Of course back end data-stores aren’t the only consumers of the sub-domain’s data. Most likely there will be other application level queries that used to read the data you’re now saving outside of the monolith database. How you manage these dependencies will depend on whether the read requests are coming from the same sub-domain or another. If they’re from the same sub-domain then it’s equally correct to either pull the data from an event stream or from microservices within that sub-domain. Gradually, a sub-domain’s dependency on the monolith database will die. If the queries are coming from a different sub-domain then it’s better to continue to update the monolith database as a consumer of the data stored locally to the sub-domain. The original table no longer containing data that is relevant to the sub-domain you’re working on.

Switching

Obviously we don’t want to have any gaps in the data being sent to our back-end stores, so as we pull functionality into microservices and add new data-stores local to the sub-domain, we also need to build the pipeline for our new back end processing of domain events into the warehouse. As this gets switched on, the loading processes from the original monolith can be switched off.

External Keys

Very few enterprise systems function in isolation. Most businesses make use of off-the-shelf packages or cloud based services such as Salesforce. Mapping records into these systems usually means using the primary key of each record to create a reference. If this has happened then the primary key from the monolith is most likely being relied on to hold things together. Moving away from the monolith database means the primary key generation has probably been lost.

There are two options here and I’d suggest going with whatever is the easiest – they both have their merits and problems.

  1. Continue to generate unique id’s in the same way as the monolith database did and continue to use these id’s for reference across different systems. Don’t rely on the monolith for id generation here, create a new process in the microservice that continues the same pattern.
  2. Start generating a new version of id generation and copy the new keys out to the external systems for reference. The original keys can eventually be lost.

Deeper than Expected

When planning the transition from monolithic architecture to microservices, there may well be promises from the management team that time will be given to build each sub-domain out properly. Don’t take this at face value – Product Managers will still have their roadmaps to fulfill and unfortunately there is maybe only 30% of any given slice of functionality being pulled out of a monolith that an end user will ever see. Expect the process to be difficult no matter what promises are made.

What I really want to get across here is that extracting even a small amount of functionality into microservices carries with it a much deeper dive into the enterprise’s tech stack than just creating a couple of application services. It requires time and focus from more than just the Dev team and before it can even be started, there has to be a architectural plan spanning the full vertical slice of a sub-domain, from front end to warehoused historical data.

Consequences of Not Going Deep Enough

How difficult do you find it in your organisation to get approval for technical upgrade work, or for dealing with technical debt as a project (which I’m not advocating is a good strategy), or for doing anything which doesn’t have a directly measurable positive impact on new product? In my experience, it isn’t easy and I’m not sure it should be, but that’s for another post.

Imagine you’ve managed to extract maybe 70% of your application layer away from your monolith but you’re still tied to the same data model. Have you achieved what you set out to do? You certainly don’t have loose coupling because everything is tied at the data level. You don’t have domain isolation. You are preventing your data team from getting access to the juicy new events you don’t really need to be raising (because the changed data is already available everywhere). You’ve turned a monolith into an abomination – it isn’t really microservices and it isn’t a classic monolith, it isn’t really any desired pattern at all. Even worse, the work you are missing is pretty big and may not directly carry with it any new features. Will you get agreement to remove coupling with the database as a project itself?

How are your developers doing? How many of them see that the strategy is only going half way? How many are moaning about paying lip service to the architecture? Wasn’t that one of the reasons you started with microservices in the first place?

Can you deploy the microservices without affecting other sub-domains? What if there are schema changes? What if there are schema changes in 2 sub-domains and one needs to be rolled back after release because it wasn’t quite right? Wasn’t this something microservices was supposed to prevent?

How many dodgy hacks or ‘surprises’ are there in your new code where devs have managed to make domain isolated services work with a single relational data model? How many devs waste time hand wrangling when they know they’re building something that is going to be technical debt the moment it goes live?

Ok, so I’m painting a darker picture than you’ll probably feel, but each of these scenarios will almost certainly come up, you just might not get to hear about it.

The crux for me is thinking about the reasons for pursuing a microservice architecture. The flexibility, loose coupling, technology agnosticity (if that’s a real term), the speed of continuous delivery that you’re looking for. Unless you go deeper than the low lying fruit of the application layer, you’ll be cheating yourself out of these benefits. Sure, you’ll see improvements short term but you are building something which is already technical debt. No matter what architecture you choose, if you don’t invest in maintaining it properly (or even building it properly in the first place) then it will ultimately become your albatross.

Code Libraries and Dependencies

Nuget has made it really straight forward to share libraries across multiple applications. It’s really straight forward. Just add a nuspec file and run ‘nuget pack’. But before you do that next time, spare a thought for the poor dev who’s trying to fit your library in their project among a dozen others when any of them may make use of the same 3rd parties you’ve referenced as dependencies.

Breaking Changes

Breaking changes happen. Sometimes intentionally, sometimes not. A breaking change in a popular 3rd party library can be pretty tricky to deal with.

Not so long ago RestSharp made a breaking change. They changed the way a value was set from being a property setter to a function call. The two things are not the same and no amount of assembly redirection will make one version work in place of the other.

My client at the time had a large application with probably about two dozen references to the old version of RestSharp. When people started to use the new version (often without even realising that it was a new version) in their libraries, it took a while before someone hit on the problem of referencing one of these libraries from code that uses the older version.

Chain of Incompatibility

So some app uses two libraries, which both use different and incompatible versions of RestSharp. Ok, so let’s just upgrade the older version to the newer version in the library and everyone’s happy.

Then we find out that it isn’t that library which referenced RestSharp, it was a further library which was referenced. So we open that library and upgrade that. Build a nuget package, re-reference that in our original library and do the same to get that into our app. Great.

Then, a couple of days later, someone is now having trouble after extending the library we just changed with some new functionality. Because it now has an updated version of RestSharp, it won’t work when it’s updated in another app because again the app references the old version.

And so the dance continues…

Just Avoid It

The best way to deal with this? Just avoid it.

A library should only provide the logic that it relates to. Trying to make a library responsible for everything is a mistake.

3rd party dependencies are also a bad thing. They can change and different versions are not always backwards compatible.

A library is not well encapsulated if it depends on 3rd parties, it means those 3rd parties have to be versioned carefully and it’s possible for all developers in your enterprise to decide just to stick on the old version – legacy code in real time, nice.

To avoid it, simple create an interface in your library that defines the contract you need. Then leave it up to the people using your library as to how it should work. You can always provide some sample code in a readme.md or even an additional nuget with a preferred implementation. Give the consumer some options.

An example interface for RestSharp functionality could be as simple as:

public interface IRestClient
{
    void SetBaseUrl(string url);
    dynamic Send(string path, string verb, string payload);
}

Although it could be much better.

A Helpful Circuit Breaker in C#

Introduction

With the increasing popularity of SOA in the guise of ‘microservices’, circuit breakers are now a must have weapon in any developer’s arsenal. Services are rarely 100% reliable; outages happen, network connections get pulled, memory gets filled, routing tables get corrupted. In an environment where multiple services are each calling multiple other services, the result of an outage in a small, seemingly unimportant service can be a random slow down in response times in your web application that gradually leads to complete server lock up. (If you don’t believe me, read Release It by Micheal Nygard from the Pragmatic bookshelf).

The idea of a circuit breaker is to detect that a service is down and fail immediately for subsequent calls in an expected manner that your application can handle gracefully. Then, every so often, the breaker will attempt to close and allow a call to be sent to the troubled service. If that call is successful then the breaker starts allowing calls through, if that call fails then the breaker remains in an open state and continues to fail with an expected exception.

Helpful.CircuitBreaker is a simple implementation that allows a developer to be proactive about the way their code handles failures.

Usage

There are 2 primary ways that the circuit breaker can be used:

  1. Exceptions thrown from the code you wish to break on can trigger the breaker to open.
  2. A returned value from the code you wish to break on can trigger the breaker to open.

Here are some basic examples of each scenario.

In the following example, exceptions thrown from _client.Send(request) will cause the circuit breaker to react based on the injected configuration.

public class MakeProtectedCall
{
    private ICircuitBreaker _breaker;
    private ISomeServiceClient _client;

    public MakeProtectedCall(ICircuitBreaker breaker, ISomeServiceClient client)
    {
        _breaker = breaker;
        _client = client;
    }

    public Response ExecuteCall(Request request)
    {
        Response response = null;
        _breaker.Execute(() => response = _client.Send(request));
        return response;
    }
}

In the following example, exceptions thrown by _client.Send(request) will still trigger the exception handling logic of the breaker, but the lamda applies additional logic to examine the response and trigger the breaker without ever receiving an exception. This is particularly useful when using an HTTP based client that may return failures as error codes and strings instead of thrown exceptions.

public class MakeProtectedCall
{
    private ICircuitBreaker _breaker;
    private ISomeServiceClient _client;

    public MakeProtectedCall(ICircuitBreaker breaker, ISomeServiceClient client)
    {
        _breaker = breaker;
        _client = client;
    }

    public Response ExecuteCall(Request request)
    {
        Response response = null;
        _breaker.Execute(() => {
        response = _client.Send(request));
        return response.Status == "OK" ? ActionResponse.Good : ActionResult.Failure;
    }
}

Initialising

The scope of a circuit breaker must be considered first. When the breaker opens, subsequent calls will not succeed, but if your breaker is in the scope of an HTTP request then there may not be a subsequent request hitting that open breaker. The next request would hit a newly built, closed breaker.

The following code will initialise a basic circuit breaker which once open will not try to close until 1 minute has passed (60 seconds is the default timeout, so there’s no need to specify it).

CircuitBreakerConfig config = new CircuitBreakerConfig
{
    BreakerId = "Some unique and constant identifier that indicates the running instance and executing process"
};
CircuitBreaker circuitBreaker = new CircuitBreaker(config);

To inject a circuit breaker into class TargetClass using Ninject, try code similar to this:

Bind().ToMethod(c => new CircuitBreaker(new CircuitBreakerConfig
{
    BreakerId = string.Format("{0}-{1}-{2}", "Your breaker name", "TargetClass", Environment.MachineName)
})).WhenInjectedInto(typeof(TargetClass)).InSingletonScope();

The above code will reuse the same breaker for all instances of the given class, so the breaker continues to report state continuously across different threads. When opened by one use, all instances of TargetClass will have an open breaker.

Tracking Circuit Breaker State

The suggested method for tracking the state of the circuit breaker is to handle the breaker events. These are defined on the CircuitBreaker class as:

///
/// Raised when the circuit breaker enters the closed state
///
public event EventHandler ClosedCircuitBreaker;

///
/// Raised when the circuit breaker enters the opened state
///
public event EventHandler OpenedCircuitBreaker;

///
/// Raised when trying to close the circuit breaker
///
public event EventHandler TryingToCloseCircuitBreaker;

///
/// Raised when the breaker tries to open but remains closed due to tolerance
///
public event EventHandler ToleratedOpenCircuitBreaker;

///
/// Raised when the circuit breaker is disposed
///
public event EventHandler UnregisterCircuitBreaker;

///
/// Raised when a circuit breaker is first used
///
public event EventHandler RegisterCircuitBreaker;

Attach handlers to these events to send information about the event to a logging or monitoring system. In this way, sending state to Zabbix or logging to log4net is trivial.

CONFIGURATION OPTIONS
Make sure each circuit breaker has it’s own configuration injected using the CircuitBreakerConfig class.

using System;
using System.Collections.Generic;
using Helpful.CircuitBreaker.Events;

namespace Helpful.CircuitBreaker.Config
{
    /// <summary>
    ///
    /// </summary>
    [Serializable]
    public class CircuitBreakerConfig : ICircuitBreakerDefinition
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="CircuitBreakerConfig"/> class.
        /// </summary>
        public CircuitBreakerConfig()
        {
            ExpectedExceptionList = new List<Type>();
            ExpectedExceptionListType = ExceptionListType.None;
            PermittedExceptionPassThrough = PermittedExceptionBehaviour.PassThrough;
            BreakerOpenPeriods = new[] { TimeSpan.FromSeconds(60) };
        }

        /// <summary>
        /// The number of times an exception can occur before the circuit breaker is opened
        /// </summary>
        /// <value>
        /// The open event tolerance.
        /// </value>
        public short OpenEventTolerance { get; set; }

        /// <summary>
        /// Gets or sets the list of periods the breaker should be kept open.
        /// The last value will be what is repeated until the breaker is successfully closed.
        /// If not set, a default of 60 seconds will be used for all breaker open periods.
        /// </summary>
        /// <value>
        /// The array of timespans representing the breaker open periods.
        /// </value>
        public TimeSpan[] BreakerOpenPeriods { get; set; }

        /// <summary>
        /// Gets or sets the expected type of the exception list. <see cref="ExceptionListType"/>
        /// </summary>
        /// <value>
        /// The expected type of the exception list.
        /// </value>
        public ExceptionListType ExpectedExceptionListType { get; set; }

        /// <summary>
        /// Gets or sets the expected exception list.
        /// </summary>
        /// <value>
        /// The expected exception list.
        /// </value>
        public List<Type> ExpectedExceptionList { get; set; }

        /// <summary>
        /// Gets or sets the timeout.
        /// </summary>
        /// <value>
        /// The timeout.
        /// </value>
        public TimeSpan Timeout { get; set; }

        /// <summary>
        /// Gets or sets a value indicating whether [use timeout].
        /// </summary>
        /// <value>
        ///   <c>true</c> if [use timeout]; otherwise, <c>false</c>.
        /// </value>
        public bool UseTimeout { get; set; }

        /// <summary>
        /// Gets or sets the breaker identifier.
        /// </summary>
        /// <value>
        /// The breaker identifier.
        /// </value>
        public string BreakerId { get; set; }

        /// <summary>
        /// Sets the behaviour for passing through exceptions that won't open the breaker
        /// </summary>
        public PermittedExceptionBehaviour PermittedExceptionPassThrough { get; set; }
    }
}

Conclusion

This library has helped me build resilient microservices that have remained stable when half the internet has been falling over. I hope it can help you as well.