Scale or Fail

I’ve heard a lot of people say something like “but we don’t need huge scalability” when pushed for reason why their architecture is straight out of the 90’s. “We’re not big enough for devops” is another regular excuse. But while it’s certainly true that many enterprises don’t need to worry so much about high loads and high availability, there are some other, very real benefits to embracing early 21st century architecture principals.

Scalable architecture is simple architecture

Keep it simple, stupid! It’s harder to do than it might seem. What initially appears to be the easy solution can quickly turn into a big ball of unmanageable, tightly coupled string of dependencies where one bad line of code can affect a dozen different applications.

In order to scale easily, a system should be simple. When scaling, you could end up with dozens or even hundreds of instances, so any complexity is multiplied. Complexity is also a recipe for waste. If you scale a complex application, the chances are you’re scaling bits which simply don’t need to scale. Systems should be designed so hot functions can be scaled independently of those which are under utilised.

Simple architecture takes thought and consideration. It’s decoupled for good reason – small things are easier to keep ‘easy’ than big things. An array of small things all built with the same basic rules and standards, can be easily managed if a little effort is put in to working out an approach which works for you. Once you have a few small things all being managed in the same way, growing to lots of small things is easy, if it’s needed.

Simple architecture is also resilient, because simple things tend not to break. And even if you aren’t bothered about a few outages, it’s better to only have the outages you plan for.

Scalable architecture is decoupled

If you need to make changes in anything more than a reverse proxy in order to scale one service, then your architecture is coupled, and shows signs of in-elasticity. Other than being scalable, decoupled architecture is much easier to maintain, and keeps a much higher level of quality because it’s easier to test.

Decoupled architecture is scoped to a specific few modules which can be deployed together repeatedly as a single stack with relative ease, once automated. Outages are easy to fix, as it’s just a case of hitting the redeploy button.

Your end users will find that your decoupled architecture is much nicer to use as well. Without having to make dozens of calls to load and save data in a myriad of different applications and databases, a decoupled application would just make only one or two calls to load or save the data to a dedicated store, then raise events for other systems to handle. It’s called eventual consistency and it isn’t difficult to make work. In fact it’s almost impossible to avoid in an enterprise system, so embracing the principal wholeheartedly makes the required thought processes easier to adopt.

Scalable architecture is easier to test

If you are deploying a small, well understood, stack with very well known behaviours and endpoints, then it’s going to be no-brainer to get some decent automated tests deployed. These can be triggered from a deployment platform with every deploy. As the data store is part of the stack and you’re following micro-architecture rules, the only records in the stack come from something in the stack. So setting up test data is simply a case of calling the API’s you’re testing, which in turn tests those API’s. You don’t have to test beyond the interface, as it shouldn’t matter (functionally) how the data is stored, only that the stack functions correctly.

Scalable architecture moves quicker to market

Given small, easily managed, scalable stacks of software, adding a new feature is a doddle. Automated tests reduce the manual test overhead. Some features can get into production in a single day, even when they require changes across several systems.

Scalable architecture leads to higher quality software

Given that in a scaling situation you would want to know your new instances are going to function, you need attain a high standard of quality in what’s built. Fortunately, as it’s easier to test, quicker to deploy, and easier to understand, higher quality is something you get. Writing test first code becomes second nature, even writing integration tests up front.

Scalable architecture reduces staff turnover

It really does! If you’re building software with the same practices which have been causing headaches and failures for the last several decades, then people aren’t going to want to work for you for very long. Your best people will eventually get frustrated and go elsewhere. You could find yourself in a position where you finally realise you have to change things, but everyone with the knowledge and skills to make the change has left.

Fringe benefits

I guess what I’m trying to point out is that I haven’t ever heard a good reason for not building something which can easily scale. Building for scale helps focus solutions on good architectural practices; decoupled, simple, easily testable, micro-architectures. Are there any enterprises where these benefits are seen as undesirable? Yet, when faced with the decision of either continuing to build the same, tightly coupled, monoliths which require full weekends (or more!) just to deploy, or building something small, light weight, easily deployed, easily maintained, and ultimately scalable, there are plenty of people claiming “Only in an ideal world!” or “We aren’t that big!”.

Bonkers!

Don’t Stream JSON Data (Part 2)

I’ve discussed the merits of JSON streaming in two prior posts: Large JSON Responses and Don’t Stream JSON Data, if  you haven’t read these yet then take a quick look first, they’re not long reads.

I’m attracted to the highly scalable proposition of scaling out the consumer, so many requests can be made individually rather than returning a huge result from the service. It places the complexity with the consumer, rather than with the service which really shouldn’t be bothered about how it’s being used. At the service side, scaling happens with infrastructure which is a pattern we should be embracing. Making multiple simultaneous requests from the consumer is reasonably straight forward in most languages.

But let’s say our service isn’t deployed somewhere which is easily scalable and that simultaneous requests at a high enough rate to finish in a reasonable time would impact the performance of the service for other consumers. What then?

In this situation, we need to make our service go as fast as possible. One way to do this would be to pull all the data in one huge SQL query and build our response objects from that. It would definitely run as quick as we can go but there are some issues with this:

  1. Complexity in embedded SQL strings is hard to manage.
  2. From a service developer’s point of view, SQL is hard to test.
  3. We’re using completely new logic to generate our objects which will need to be tested. In our example scenario (in Large JSON Responses) we already have tested, proven logic for building our objects but it builds one at a time.

Complexity and testability are pretty big issues, but I’m more interested in issue 3: ignoring and duplicating existing logic. API’s in front of legacy databases are often littered with crazy unknowable logic tweaks; “if property A is between 3 and 10 then override property B with some constant, otherwise set property C to the value queried from some other table but just include the last 20 characters” – I’m sure you’ve seen the like, and getting this right the first time around was probably pretty tough going, so do you really want to go through that again?

We could use almost the same code as for our chunked response, but parallelise the querying of each record. Now our service method would look something like this:

public IEnumerable GetByBirthYear(int birthYear)
{
    IEnumerable customerIds = _customersRepository.GetIdsForBirthYear(birthYear);
    IList customerList = new List();
    Parallel.ForEach(customerIds, id =>
    {
        Customer customer;
        try
        {
            customer = Get(id);
        }
        catch (Exception e)
        {
            customer = new Customer
            {
                Id = id,
                CannotRetrieveException = e
            };
        }
        customerList.Add(customer);
    });
    return customerList;
}

public Customer Get(int customerId)
{
    ...
}

Firstly, the loop through the customer id’s is no longer done in a foreach loop, we’ve added a call to Parallel.ForEach. This method of parallelisation is particularly clever in that it gradually increases the degree of parallelism to a level determined by available resources – it’s one of the easiest ways to achieve parallel processing. Secondly, we’re now populating a full list of customers and returning the whole result in one go. This is because it’s simply not possible to yield inside the parallel lambda expression. It also means that responding with a chunked response is pretty redundant and probably adds a bit of extra complexity unnecessarily.

This strategy will only work if all code called by the Get() method is thread safe. Something to be very careful with is the database connection, SqlConnection is not thread safe.

Don’t keep your SqlConnection objects hanging around, new up a new object for every time you want to query the database unless you need to continue the current transaction. No matter how many SqlConnection objects you create, the number of connections are limited by the server and by what’s configured in the connection string. A new connection will be requested from the pool but will only be retrieved when one is available.

So now we have an n+1 scenario where we’re querying the database possibly thousands of times to build our response. Even though we may be making these queries on several threads and the processing time might be acceptable, given all the complexity is now in our service we can take advantage of the direct relationship with the database to make this even quicker.

Let’s say our Get() method needs to make 4 separate SQL queries to build a Customer record, each taking one integer value as an ID. It might look something like this:

public Customer Get(int customerId)
{
    var customer = _customerRepository.Get(customerId);
    customer.OrderHistory = _orderRepository.GetManyByCustomerId(customerId);
    customer.Address = _addressRepository.Get(customer.AddressId);
    customer.BankDetails = _bankDetailsRepository.Get(customer.BankDetailsId);
}

To stop each of these .Get() methods hitting the database we can cache the data up front, one SQL query per repository class. This preserves our logic but presents a problem – assuming we are using Microsoft SQL Server, then there is a practical limit to the number of items we can add into an ‘IN’ clause, so we can’t just stick thousands of customer ID’s in there (https://docs.microsoft.com/en-us/sql/t-sql/language-elements/in-transact-sql). If we can select by multiple ID’s, then we can turn our n+1 scenario into just 5 queries.

It turns out that we can specify thousands of ID’s in an ‘IN’ clause with a sub-query. So our problem shifts to how to create a temporary table with all our customer ID’s in it to use in our sub-query. Unless you’re using a very old version of SQL Server, then you can have multiple rows in a basic ‘INSERT’ statement. For example:

INSERT INTO #TempCustomerIDs (ID)
VALUES
(1)
(2)
(3)
(4)
(5)
(6)

Which will result in 6 rows in the table with the values 1 through 6 in the ID column. However we will once again hit a limit – it’s only possible to insert 1000 rows in this way with each insert statement.

Fortunately, we’re working one level above raw SQL, and we can work our way around this limitation. An example is in the code below.

public void LoadCache(IEnumerable customerIds)
{
    string insertCustomerIdsQuery = string.Empty;
    foreach (IEnumerable customerIdList in customerIds.ToPagedList(500))
    {
        insertCustomerIdsQuery +=
            $" INSERT INTO #TempCustomerIds (CustomerId) VALUES ('{string.Join("'),('", customerIdList)}');";
    }
    string queryByCustomerId =
        $@"IF OBJECT_ID('tempdb..#TempCustomerIds') IS NOT NULL DROP TABLE #TempCustomerIds;
CREATE TABLE #TempCustomerIds (CustomerId int);

{insertCustomerIdsQuery}

{CustomerQuery.SelectBase} WHERE c.CustomerId IN (SELECT CustomerId FROM #TempCustomerIds);

IF OBJECT_ID('tempdb..#TempCustomerIds') IS NOT NULL DROP TABLE #TempCustomerIds;";
    var customers = _repo.FindAll(queryByCustomerId);
    foreach (var customer in customers)
    {
        Cache.Add(customer.CustomerId, customer);
    }
}

A few things from the code snippet above:

  • ToPagedList() is an extension method that returns a list of lists of the number of items passed in. So .ToPagedList(500) will break down a list into multiple lists, each with 500 items. The idea is to use a number which is less than the 1000 row limit for inserts. You could achieve the same thing in different ways.
  • The string insertCustomerIdsQuery is the result of concatenating all the insert statements together.
  • CustomerQuery.SelectBase is the select statement that would have had the ‘select by id’ predicate, with that predicate removed.
  • The main SQL statement first checks whether the temp table exists, and then creates it if it doesn’t. We then insert all the ID’s into that table. Then we select all matching records where the ID’s are in the temp table, and finally delete the temp table.
  • Cache is a simple dictionary of customers by ID.

Using this method, each repository can have the data we expect to be present loaded into it before the request is made. It’s far more efficient to load these thousands of records in one go rather than making thousands of individual queries.

In our example, we are retrieving addresses and bank details by the ID’s retrieved on the Customer objects. To support this, we need to retrieve the bank detail ID’s and address ID’s from the cache of Customers before loading those caches. Then all subsequent logic will run, but pretty blindingly fast seeing as it’s only accessing memory and not having to make calls to the database.

Summing Up

The strategy for the fastest response, is probably to hit the database with one big query, but there are down sides to doing this. Specifically we don’t want to have lots of logic in a SQL query, and we’d like to re-use the code we’ve already written and tested for building individual records.

Loading all the ID’s from the database and iterating through the existing code one record at a time would work fine for small result sets where performance isn’t an issue, but if we’re expecting thousands of records and we want it to run in a few minutes then it’s not enough.

Caching the data using a few SQL queries is far more efficient and means we can re-use any logic easily. Even most of the SQL is refactored out of the existing queries.

Running things asynchronously will speed things up even more. If you’re careful with your use of database connections, then the largest improvement can probably be found by running the queries in parallel, as these will probably be your longest running processes.

 

Don’t Stream JSON Data

I recently published a post about how to stream large JSON payloads from a webservice using a chunked response, before reading this post it’s probably best to read that post here. Streaming is a fantastic method of sending large amounts of data with only a small memory overhead on the server, but for JSON data there could well be a better way. First of all, let’s think about the strategy for generating the JSON stream which was discussed in the earlier post:

  1. Query the ID’s of all records that should be included in the response.
  2. Loop through the ID’s and use the code which backs the ‘get by ID’ endpoint to generate each JSON payload for the stream.
  3. Return each JSON payload one at a time by yielding from an enumerator.

Seems straight forward enough, and I’ve seen a service sit there for nearly two hours ticking away returning objects. But is that really a good thing?

Let’s list some of the things which might go wrong.

  1. Network interruption.
  2. In these days of Dev Ops, someone re-deploying the service while it’s ‘mid stream’.
  3. A crash caused by a different endpoint, triggering a service restart.
  4. An exception in the consuming app killing the instance of whatever client is processing the stream.

These things might not seem too likely but the longer it takes to send the full stream, the greater the chance that something like this will happen.

Given a failure in a 2 hour response, how does the client recover? It isn’t going to be a quick recovery, that’s for sure. Even if the client keeps a track of each payload in the stream, in order to get back to where the problem occurred and continue processing, the client has to sit through the entire response all over again!

Another Way

Remember that nifty bit of processing our streaming service is doing in order to loop through all the records it needs to send? If we move that to the consumer, then it can request each record in whatever order it likes, as many times as it needs to.

  1. The consumer requests the collection of ID’s for all records it needs in a single request.
  2. The consumer saves this collection, so even if it gets turned off, re-deployed, or if anything else happens, it still knows what it needs to do.
  3. The consumer requests each record from the service by ID, keeping track of which records it has already processed.

This strategy actually doesn’t require any more code than streaming, in fact given that you don’t have to setup a handler to set the chunked encoding property on the response, it’s actually less code. Not only that, but because there will now be many discrete requests made via the HTTP protocol, these can be load balanced and shared between as many instances of our service as is necessary. The consumer could even spin up multiple processes and make parallel requests, it could get through the data transfer quicker than if it had to sit there and accept each record one at a time from a single instance.

We can even go one step further. Our client is going to have to retrieve a stack of ID’s which it will use to request each record in turn. Well it’s not that difficult to give it not just a collection of ID’s but a collection of URL’s. It’s not a step that everyone wants to take, but it has a certain cleanliness to it.

And So

If you’re faced with a requirement for a large JSON response, unless you’re running your applications on the most stable tech stack in the world and you’re writing the most reliable, bug free code in the world, then you could probably build something much better by throwing out the idea of returning a single huge response, even if streamed, in favour of multiple single requests for each record individually.

Large JSON Responses

The long slog from a 15 year old legacy monolith system to an agile, microservice based system will almost inevitably include throwing some API’s in front of a big old database. Building a cleaner view of the domain allows for some cleaner lines to be drawn between concerns, each with their own service. But inside those services there’s usually a set of ridiculous SQL queries building the nice clean models being exposed. These ugly SQL queries add a bit of time to the responses, and can lead to a bit of complexity but this is the real world, often we can only do so much.

So there you are after a few months of work with a handful of services deployed to an ‘on premises’ production server. Most responses are pretty quick, never more than a couple of seconds. But now you’ve been asked to build a tool for generating an Excel document with several thousand rows in it. To get the data, a web app will make an http request to the on prem API. So far so good. But once you’ve written the API endpoint and requested some realistic datasets through it, you realise the response takes over half an hour. What’s more, while that response is being built, the API server runs pretty hot. If more than a couple of users request a new Excel doc at the same time then everything slows down.

Large responses from API calls are not always avoidable, but there are a couple of things we can do to lessen the impact they have on resources.

Chunked Response

Firstly, lets send the response a bit at a time. In .NET Web Api, this is pretty straight forward to implement. We start with a simple HttpMessageHandler:

public class ChunkedResponseHttpHandler : DelegatingHandler
{
    protected override async Task SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        var response = await base.SendAsync(request, cancellationToken);
        response.Headers.TransferEncodingChunked = true;
        return response;
    }
}

Now we need to associate this handler with the controller action which is returning our large response. We can’t do that with attribute based routing, but we can be very specific with a routing template.

config.Routes.MapHttpRoute("CustomersLargeResponse",
                "customers",
                new { controller = "Customers", action = "GetByBirthYear" },
                new { httpMethod = new HttpMethodConstraint(HttpMethod.Get)  },
                new ChunkedResponseHttpHandler(config));

In this example, the url ‘customers’ points specifically at CustomersController.GetByBirthYear() and will only accept GET requests. The handler is assigned as the last parameter passed to MapHttpRoute().

The slightly tricky part comes when writing the controller action. Returning everything in chunks won’t help if you wait until you’ve loaded the entire response into memory before sending it. Also, streaming results isn’t something that many database systems natively support. So you need to be a bit creative about how you get data for your response.

Let’s assume you’re querying a database, and that your endpoint is returning a collection of resources which you already have a pretty ugly SQL query for retrieving by id. The scenario is not as contrived as you might think. Dynamically modifying the ‘where’ clause of the ‘select by id’ query and making it return all the results you want would probably give the fastest response time. It’s a valid approach, but if you know you’re going to have a lot of results then you’re risking memory issues which can impact other processes, plus you’re likely to end up with some mashing of SQL strings to share the bulk of the select statement and add different predicates which isn’t easily testable. The approach I’m outlining here is best achieved by breaking the processing into two steps. First, query for the ID’s for the entities you’re going to return. Secondly, use your ‘select by ID’ code to retrieve them one at a time, returning them in an enumerator, rather than a fully realised collection type. Let’s have a look at what a repository method might look like for this.

public IEnumerator GetByBirthYear(int birthYear)
{
    IEnumerable customerIds = _customersRepository.GetIdsForBirthYear(birthYear);
    foreach (var id in customerIds)
    {
        Customer customer;
        try
        {
            customer = Get(id);
        }
        catch (Exception e)
        {
            customer = new Customer
            {
                Id = id,
                CannotRetrieveException = e
            };
        }
        yield return customer;
    }
}

public Customer Get(int customerId)
{
    ...
}

The important things to notice here are:

  1. The first call is to retrieve the customer ID’s we’re interested in.
  2. Each customer is loaded from the same Get(int customerId) method that is used to return customers by ID.
  3. We don’t want to terminate the whole process just because one customer couldn’t be loaded. Equally, we need to do something to let the caller know there might be some missing data. In this example we simply return an empty customer record with the exception that was thrown while loading. You might not want to do this if your API is public, as you’re leaking internal details, but for this example let’s not worry.

The controller action which exposes this functionality might look a bit like this:

public IEnumerable GetByBirthYear(int birthYear)
{
    IEnumerator iterator;
    iterator = _customersServices.GetForBirthYear(birthYear);
    while (iterator.MoveNext())
    {
        yield return iterator.Current;
    }
}

Things to notice here are:

  1. There’s no attribute based routing in use here. Because we need to assign our HttpHandler to the action, we have to use convention based routing.
  2. At no point are the results loaded into a collection of any kind. We retrieve an enumerator and return one result at a time until there are no more results.

JSON Stream

Using this mechanism is enough to return the response in a chunked manner and start streaming the response as soon as there’s a single result ready to return. But there’s still one more piece to the puzzle for our service. Depending what language the calling client is written in, it can either be straight forward to consume the JSON response as we have here or it can be easier to consume what’s become known as a JSON stream. For a DotNet consumer, sending our stream as a comma delimited array is sufficient. If we’re expecting calls from a Ruby client then we should definitely consider converting our response to a JSON stream.

For our customers response, we might send a response which looks like this (but hopefully much bigger):

{"id":123,"name":"Fred Bloggs","birthYear":1977}
{"id":133,"name":"Frank Bruno","birthYear":1961}
{"id":218,"name":"Ann Frank","birthYear":1929}

This response is in a format called Line-Delimited JSON (LDJSON). There’s no opening square bracket to say this is a collection, because it isn’t a collection. This is a stream of individual records which can be processed without having to wait for the entire response to be evaluated. Which makes a lot of sense; just as we don’t want to have to load the entire response on the server, we also don’t want to load the entire response on the client.

A chunked response is something that most HTTP client packages will handle transparently. Unless the client application is coded specifically to receive each parsed object in each chunk, then there’s no difference on the client side to receiving an unchunked response. LDJSON breaks this flexibility, because the response is not valid JSON – one client will consume it easily, but another would struggle. At the time of writing, DotNet wants only standard JSON whereas it’s probably easier in Ruby to process LDJSON. That’s not to say it’s impossible to consume the collection in Ruby or LDJSON in DotNet, it just requires a little more effort for no real reward. To allow different clients to still consume the endpoint, we can add a MediaTypeFormatter specifically for the ‘application/x-json-stream’ media type (this isn’t an official media type, but it has become widely used). So any consumer can either expect JSON or an LDJSON stream.

public class JsonStreamFormatter : MediaTypeFormatter
{
    public JsonStreamFormatter()
    {
        SupportedMediaTypes.Add(new MediaTypeHeaderValue("application/x-json-stream"));
    }

    public override async Task WriteToStreamAsync(Type type, object value, Stream writeStream, HttpContent content,
        TransportContext transportContext)
    {
        using (var writer = new StreamWriter(writeStream))
        {
            var response = value as IEnumerable;
            if (response == null)
            {
                throw new NotSupportedException($"Cannot format for type {type.FullName}.");
            }
            foreach (var item in response)
            {
                string itemString = JsonConvert.SerializeObject(item, Formatting.None);
                await writer.WriteLineAsync(itemString);
            }
        }
    }

    public override bool CanReadType(Type type)
    {
        return false;
    }

    public override bool CanWriteType(Type type)
    {
        Type enumerableType = typeof(IEnumerable);
        return enumerableType.IsAssignableFrom(type);
    }
}

This formatter only works for types implementing IEnumerable, and uses Newtonsoft’s JsonConvert object to serialise each object in turn before pushing it into the response stream. Enable the formatter by adding it to the Formatters collection:

config.Formatters.Add(new JsonStreamFormatter());

DotNet Consumer

Let’s take a look at a DotNet consumer coded to expect a normal JSON array of objects delivered in a chunked response.

public class Client
{
    public IEnumerator Results()
    {
        var serializer = new JsonSerializer();
        using (var httpClient = new HttpClient())
        {
            httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
            using (var stream = httpClient.GetStreamAsync("http://somedomain.com/customers?birthYear=1977").Result)
            {
                using (var jReader = new JsonTextReader(new StreamReader(stream)))
                {
                    while (jReader.Read())
                    {
                        if (jReader.TokenType != JsonToken.StartArray && jReader.TokenType != JsonToken.EndArray)
                            yield return serializer.Deserialize(jReader);
                    }
                }
            }
        }
    }
}

Here we’re using the HttpClient class and we’re requesting a response as “application/json”. Instead of using a string version of the content, we’re working with a stream. The really cool part is that we don’t have to do much more than just throw that stream into a JsonTextReader (part of NewtonSoft.Json). We can yield the response of each JSON token as long as we ignore the first and last tokens, which are the open and closing square brackets of the JSON array. Calling jReader.Read() at the next level reads the whole content of that token, which is one full object in the JSON response.

This method allows each object to be returned for processing while the stream is still being received. The client can save on memory usage just as well as the service.

Ruby Consumer

I have only a year or so experience with Ruby on Rails, but I’ve found it to be an incredibly quick way to build services. In my opinion, there’s a trade off when considering speed of development vs complexity – because the language is dynamic the developer has to write more tests which can add quite an overhead as a service grows.

To consume our service from a Ruby client, we might write some code such as this:

def fetch_customers(birth_year)
  uri  = "http://somedomain.com/customers"
  opts = {
  query: { birthYear: birth_year },
    headers: {'Accept' => 'application/x-json-stream'},
    stream_body: true,
    timeout: 20000
  }

  parser.on_parse_complete = lambda do |customer|
      yield customer.deep_transform_keys(&:underscore).with_indifferent_access
  end

  HTTParty.get(uri, opts) do |chunk|
     parser << chunk
  end
end

private
def parser
  @parser ||= Yajl::Parser.new
end

Here we're using HTTParty to manage the request, passing it ‘stream_body: true’ in the options and ‘Accept’ => ‘application/x-json-stream’ set. This tells HTTParty that the response will be chunked, the header tells our service to respond with LDJSON.

From the HTTParty.get block, we see that each chunk is being passed to a JSON parser Yajl::Parser, which understands LDJSON. Each chunk may contain a full JSON object, or several, or partial objects. The parser will recognise when it has enough JSON for a full object to be deserialized and it will send it to the method assigned to parser.on_parse_complete where we’re simply returning the object as a hash with indifferent access.

The Result

Returning responses in a chunked fashion is more than just a neat trick, the amount of memory used by a service returning data in this fashion compared to loading the entire result set into memory before responding is tiny. This means more consumers can request these large result sets and other processes on the server are not impacted.

From my own experience, the Yajl library seems to become unstable after handling a response which streams for more than half an hour or so. I haven’t seen anyone else having the same issue, but on one project I’ve ended up removing Yajl and just receiving the entire response with HTTParty and parsing the collection fully in memory. It isn’t ideal, but it works. It also doesn’t stop the service from streaming the response, it’s just the client that waits for the response to load completely before parsing it.

It’s a nice strategy to understand and is useful in the right place, but in an upcoming post I’ll be explaining why I think it’s often better to avoid large JSON responses altogether and giving my preferred strategy and reasoning.

Going Deep Enough with Microservices

Moving from a monolith architecture to microservices is a widely debated process, with many recommendations and nuggets of advice available on the web in blogs like this. There are so many different opinions out there mainly because where an enterprise finds their main complexities lay depends on the skillsets of their technologists, the domain knowledge within the business and the existing code base. During the years I’ve spent as a contractor in a very wide range of enterprises, I’ve seen lots of monolith architectures – all of them causing slightly different headaches because those responsible for developing them let different aspects of the architecture slip. After all, the thing that is often forgotten is that if a monolith is maintained well, then it can work. The reverse is also true – if a microservice architecture is left to evolve on its own, it can cause as many problems as a poorly maintained monolith.

Domains

One popular way to break things down is using Domain Driven Design. Two books which cover most concepts involved in this process are ‘Building Microservices’ by Sam Newton (http://shop.oreilly.com/product/0636920033158.do) and ‘Implementing Domain Driven Design’ by Vaughn Vernon (http://www.amazon.com/Implementing-Domain-Driven-Design-Vaughn-Vernon/dp/0321834577) which largely references ‘Domain Driven Design: Tackling Complexity in the Heart of Software’ by Eric Evans (http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215). I recommend Vaughn’s book over Evans’ as the latter is a little dry.

If you take on board even just half the content covered in these books, you’ll be on a reasonable footing to get started. You’ll make mistakes but as Sam Newton points out (and I’ve seen for myself) that’s inevitable.

Something that seems to be left out of a lot of domain driven discussions is what happens beyond the basic CRUD processes and domain logic in the application layer. Attention sits primarily with the thin interaction between a web interface and the domain processing by the aggregate in question. When dismantelling a monolith architecture into microservices, focus on just the application layer can give the impression of fast progress but in reality half the picture is missing. It’s likely that in a few months there will be several microservices but instead of them operating solely in their sub-domains, they’ll still be tied to the database that the original monolith was using.

Context

It’s hugely important to pull the domain data out of the monolith store. This is for the very same reasons we segregate service responsibilities into sub-domains. Data pertaining to a given domain may exist in other domains as well but changes will not necessarily be subjected to the same domain rules and individual records may have different properties. There may be a User record in several sub-domains, each with a Username property but the logic around how duplicate Usernames are prevented should sit firmly in a single sub-domain. If a service in a different sub-domain needs to update the username, it should either call a public service from the Profile sub-domain or raise a ‘Username Updated’ event that the Profile sub-domain would handle, process and possibly respond with a ‘Username Update Failed’ event of its own.

This example may be a little contrived – checking for duplicates could be something that’s implemented everywhere it’s needed. But consider what would happen if it became necessary to check for duplicates within another external system every time a Username is updated. That logic could easily be encapsulated behind the call to the Profile service but having to update every service that updates Usernames wouldn’t be good practice.

So if we are now happy that the same data represented in different sub-domains could at any one time be different (given the previous two paragraphs) then we shouldn’t store the data for both sub-domains in the same table.

Local Data

In fact, we’re now pretty well removed from needing a classic relational database for storing data that’s local to the sub-domain. We’re dealing with data that is limited in scope and is intended for use solely by the microservices built to sit in that sub-domain. NoSQL databases are ideal for this scenario and no matter which platform you’ve chosen to build on there are excellent options available. One piece of advice I think is pretty sound is that if you are working in the cloud, you’ll usually get the best performance by using the data services provided by your cloud provider. Make sure you do your homework, though – some have idiosyncracies that can impact performance if you don’t know about them.

So now we have data stored locally to the sub-domain, but this isn’t where the work stops. It’s likely there’s a team of DBA’s jumping around wondering why their data warehouse isn’t getting any new data.

The problem is that the relational database backing the monolith wasn’t just acting as a data-store for the application. There were processes feeding other data-stores for things like customer reporting, machine learning platforms and BI warehouses. In fact, anything that requires a historical view of things will be reading it from one or more stores that are loaded incrementally from the monolith’s relational database. Now data is being stored in a manner best suiting any given sub-domain, there isn’t a central source for that data to be pulled from into these downstream stores.

Shift of Responsibility

Try asking a team of dba’s if they fancy writing CLR based stored procedures to detect changes and pull new records into their warehouse by querying whatever data-store technologies have been decided on in each case – I doubt they’ll be too receptive. The responsibility for getting data out of each local data-store now has to move closer to the application services.

The data guys are interested in recording historical and aggregated records, which is convenient as there is a useful well known tool for informing different systems that something has happened – an event.

It’s been argued that using events to communicate across sub-domains is miss-using an event stream as a message bus. My argument in this case is that the back-end historical data-store is still within the original sub-domain. The data being stored belongs specifically to that sub-domain and still holds the same context as when it was saved. There has been a transition to a new medium of storage but that’s all.

So we are now free to raise events from our application microservices into eventstreams which are then handled by a service specifically designed to transfer data from events into whatever downstream stores were originally being fed from the monolith database. This gives us full extraction from the monolithic architecture and breaks the sub-domain’s dependency on the monolith database.

There is also the possibility that we can now give more fine grained detail of changes than was being recorded previously.

Gaps in the Monolith Database

Of course back end data-stores aren’t the only consumers of the sub-domain’s data. Most likely there will be other application level queries that used to read the data you’re now saving outside of the monolith database. How you manage these dependencies will depend on whether the read requests are coming from the same sub-domain or another. If they’re from the same sub-domain then it’s equally correct to either pull the data from an event stream or from microservices within that sub-domain. Gradually, a sub-domain’s dependency on the monolith database will die. If the queries are coming from a different sub-domain then it’s better to continue to update the monolith database as a consumer of the data stored locally to the sub-domain. The original table no longer containing data that is relevant to the sub-domain you’re working on.

Switching

Obviously we don’t want to have any gaps in the data being sent to our back-end stores, so as we pull functionality into microservices and add new data-stores local to the sub-domain, we also need to build the pipeline for our new back end processing of domain events into the warehouse. As this gets switched on, the loading processes from the original monolith can be switched off.

External Keys

Very few enterprise systems function in isolation. Most businesses make use of off-the-shelf packages or cloud based services such as Salesforce. Mapping records into these systems usually means using the primary key of each record to create a reference. If this has happened then the primary key from the monolith is most likely being relied on to hold things together. Moving away from the monolith database means the primary key generation has probably been lost.

There are two options here and I’d suggest going with whatever is the easiest – they both have their merits and problems.

  1. Continue to generate unique id’s in the same way as the monolith database did and continue to use these id’s for reference across different systems. Don’t rely on the monolith for id generation here, create a new process in the microservice that continues the same pattern.
  2. Start generating a new version of id generation and copy the new keys out to the external systems for reference. The original keys can eventually be lost.

Deeper than Expected

When planning the transition from monolithic architecture to microservices, there may well be promises from the management team that time will be given to build each sub-domain out properly. Don’t take this at face value – Product Managers will still have their roadmaps to fulfill and unfortunately there is maybe only 30% of any given slice of functionality being pulled out of a monolith that an end user will ever see. Expect the process to be difficult no matter what promises are made.

What I really want to get across here is that extracting even a small amount of functionality into microservices carries with it a much deeper dive into the enterprise’s tech stack than just creating a couple of application services. It requires time and focus from more than just the Dev team and before it can even be started, there has to be a architectural plan spanning the full vertical slice of a sub-domain, from front end to warehoused historical data.

Consequences of Not Going Deep Enough

How difficult do you find it in your organisation to get approval for technical upgrade work, or for dealing with technical debt as a project (which I’m not advocating is a good strategy), or for doing anything which doesn’t have a directly measurable positive impact on new product? In my experience, it isn’t easy and I’m not sure it should be, but that’s for another post.

Imagine you’ve managed to extract maybe 70% of your application layer away from your monolith but you’re still tied to the same data model. Have you achieved what you set out to do? You certainly don’t have loose coupling because everything is tied at the data level. You don’t have domain isolation. You are preventing your data team from getting access to the juicy new events you don’t really need to be raising (because the changed data is already available everywhere). You’ve turned a monolith into an abomination – it isn’t really microservices and it isn’t a classic monolith, it isn’t really any desired pattern at all. Even worse, the work you are missing is pretty big and may not directly carry with it any new features. Will you get agreement to remove coupling with the database as a project itself?

How are your developers doing? How many of them see that the strategy is only going half way? How many are moaning about paying lip service to the architecture? Wasn’t that one of the reasons you started with microservices in the first place?

Can you deploy the microservices without affecting other sub-domains? What if there are schema changes? What if there are schema changes in 2 sub-domains and one needs to be rolled back after release because it wasn’t quite right? Wasn’t this something microservices was supposed to prevent?

How many dodgy hacks or ‘surprises’ are there in your new code where devs have managed to make domain isolated services work with a single relational data model? How many devs waste time hand wrangling when they know they’re building something that is going to be technical debt the moment it goes live?

Ok, so I’m painting a darker picture than you’ll probably feel, but each of these scenarios will almost certainly come up, you just might not get to hear about it.

The crux for me is thinking about the reasons for pursuing a microservice architecture. The flexibility, loose coupling, technology agnosticity (if that’s a real term), the speed of continuous delivery that you’re looking for. Unless you go deeper than the low lying fruit of the application layer, you’ll be cheating yourself out of these benefits. Sure, you’ll see improvements short term but you are building something which is already technical debt. No matter what architecture you choose, if you don’t invest in maintaining it properly (or even building it properly in the first place) then it will ultimately become your albatross.

Events vs Commands

In the world of service oriented architectures and CQRS style processes there is a tendancy for nearly everything to raise events. Going back a few years however, before REST became fashionable many interactions were by RPC and often the result of processing commands from a queue.

So when did commands become an anti-pattern? Well of course, they never did. These days we just have to understand when it’s more appropriate to send a command or raise an event.

Here’s a table to help you decide what you should be using:

Events Commands
An event is all about something that has already happened A command is all about something that the originating service wants to happen (although it might not be successful)
A service raising an event doesn’t care what happens to it. Something consuming an event is not critical to the service’s function. A service sending a command needs that command to be processed as part of it’s functionality.
An event could be consumed by one, many or no consumers. A command is intended for one specific consumer.
An event can suggest loose coupling between services. A command definitely indicates tight coupling – the originating service knows about the command target.
A service prevented from raising an event can only report that the event was not raised. A service prevented from sending a command can report the failure to a team with specific domain knowledge about what will happen down stream if the command is not processed. The service may be designed to fail its own process if the command fails.

A really good example of the right use of an event is communicating between services within a bounded context that something has happened. The originating service will have successfully completed its function before raising the event. Consumers of the event do something else in addition that the originating service doesn’t really care about.

A good example of the right use of a command is where two different platforms need to be kept in sync with each other. When data is updated in one system a sync command is sent to update the other. If something stops that command getting sent (e.g. an auth issue between the service and a message queue) then the service can react and alert people to the issue, or it may be that the update in the originating service needs to fail.

Both events and commands are important in a distributed system. Using them in the right places makes your intent much clearer and helps keep your system structured.