Don’t Stream JSON Data (Part 2)

I’ve discussed the merits of JSON streaming in two prior posts: Large JSON Responses and Don’t Stream JSON Data, if  you haven’t read these yet then take a quick look first, they’re not long reads.

I’m attracted to the highly scalable proposition of scaling out the consumer, so many requests can be made individually rather than returning a huge result from the service. It places the complexity with the consumer, rather than with the service which really shouldn’t be bothered about how it’s being used. At the service side, scaling happens with infrastructure which is a pattern we should be embracing. Making multiple simultaneous requests from the consumer is reasonably straight forward in most languages.

But let’s say our service isn’t deployed somewhere which is easily scalable and that simultaneous requests at a high enough rate to finish in a reasonable time would impact the performance of the service for other consumers. What then?

In this situation, we need to make our service go as fast as possible. One way to do this would be to pull all the data in one huge SQL query and build our response objects from that. It would definitely run as quick as we can go but there are some issues with this:

  1. Complexity in embedded SQL strings is hard to manage.
  2. From a service developer’s point of view, SQL is hard to test.
  3. We’re using completely new logic to generate our objects which will need to be tested. In our example scenario (in Large JSON Responses) we already have tested, proven logic for building our objects but it builds one at a time.

Complexity and testability are pretty big issues, but I’m more interested in issue 3: ignoring and duplicating existing logic. API’s in front of legacy databases are often littered with crazy unknowable logic tweaks; “if property A is between 3 and 10 then override property B with some constant, otherwise set property C to the value queried from some other table but just include the last 20 characters” – I’m sure you’ve seen the like, and getting this right the first time around was probably pretty tough going, so do you really want to go through that again?

We could use almost the same code as for our chunked response, but parallelise the querying of each record. Now our service method would look something like this:

public IEnumerable GetByBirthYear(int birthYear)
    IEnumerable customerIds = _customersRepository.GetIdsForBirthYear(birthYear);
    IList customerList = new List();
    Parallel.ForEach(customerIds, id =>
        Customer customer;
            customer = Get(id);
        catch (Exception e)
            customer = new Customer
                Id = id,
                CannotRetrieveException = e
    return customerList;

public Customer Get(int customerId)

Firstly, the loop through the customer id’s is no longer done in a foreach loop, we’ve added a call to Parallel.ForEach. This method of parallelisation is particularly clever in that it gradually increases the degree of parallelism to a level determined by available resources – it’s one of the easiest ways to achieve parallel processing. Secondly, we’re now populating a full list of customers and returning the whole result in one go. This is because it’s simply not possible to yield inside the parallel lambda expression. It also means that responding with a chunked response is pretty redundant and probably adds a bit of extra complexity unnecessarily.

This strategy will only work if all code called by the Get() method is thread safe. Something to be very careful with is the database connection, SqlConnection is not thread safe.

Don’t keep your SqlConnection objects hanging around, new up a new object for every time you want to query the database unless you need to continue the current transaction. No matter how many SqlConnection objects you create, the number of connections are limited by the server and by what’s configured in the connection string. A new connection will be requested from the pool but will only be retrieved when one is available.

So now we have an n+1 scenario where we’re querying the database possibly thousands of times to build our response. Even though we may be making these queries on several threads and the processing time might be acceptable, given all the complexity is now in our service we can take advantage of the direct relationship with the database to make this even quicker.

Let’s say our Get() method needs to make 4 separate SQL queries to build a Customer record, each taking one integer value as an ID. It might look something like this:

public Customer Get(int customerId)
    var customer = _customerRepository.Get(customerId);
    customer.OrderHistory = _orderRepository.GetManyByCustomerId(customerId);
    customer.Address = _addressRepository.Get(customer.AddressId);
    customer.BankDetails = _bankDetailsRepository.Get(customer.BankDetailsId);

To stop each of these .Get() methods hitting the database we can cache the data up front, one SQL query per repository class. This preserves our logic but presents a problem – assuming we are using Microsoft SQL Server, then there is a practical limit to the number of items we can add into an ‘IN’ clause, so we can’t just stick thousands of customer ID’s in there ( If we can select by multiple ID’s, then we can turn our n+1 scenario into just 5 queries.

It turns out that we can specify thousands of ID’s in an ‘IN’ clause with a sub-query. So our problem shifts to how to create a temporary table with all our customer ID’s in it to use in our sub-query. Unless you’re using a very old version of SQL Server, then you can have multiple rows in a basic ‘INSERT’ statement. For example:

INSERT INTO #TempCustomerIDs (ID)

Which will result in 6 rows in the table with the values 1 through 6 in the ID column. However we will once again hit a limit – it’s only possible to insert 1000 rows in this way with each insert statement.

Fortunately, we’re working one level above raw SQL, and we can work our way around this limitation. An example is in the code below.

public void LoadCache(IEnumerable customerIds)
    string insertCustomerIdsQuery = string.Empty;
    foreach (IEnumerable customerIdList in customerIds.ToPagedList(500))
        insertCustomerIdsQuery +=
            $" INSERT INTO #TempCustomerIds (CustomerId) VALUES ('{string.Join("'),('", customerIdList)}');";
    string queryByCustomerId =
        $@"IF OBJECT_ID('tempdb..#TempCustomerIds') IS NOT NULL DROP TABLE #TempCustomerIds;
CREATE TABLE #TempCustomerIds (CustomerId int);


{CustomerQuery.SelectBase} WHERE c.CustomerId IN (SELECT CustomerId FROM #TempCustomerIds);

IF OBJECT_ID('tempdb..#TempCustomerIds') IS NOT NULL DROP TABLE #TempCustomerIds;";
    var customers = _repo.FindAll(queryByCustomerId);
    foreach (var customer in customers)
        Cache.Add(customer.CustomerId, customer);

A few things from the code snippet above:

  • ToPagedList() is an extension method that returns a list of lists of the number of items passed in. So .ToPagedList(500) will break down a list into multiple lists, each with 500 items. The idea is to use a number which is less than the 1000 row limit for inserts. You could achieve the same thing in different ways.
  • The string insertCustomerIdsQuery is the result of concatenating all the insert statements together.
  • CustomerQuery.SelectBase is the select statement that would have had the ‘select by id’ predicate, with that predicate removed.
  • The main SQL statement first checks whether the temp table exists, and then creates it if it doesn’t. We then insert all the ID’s into that table. Then we select all matching records where the ID’s are in the temp table, and finally delete the temp table.
  • Cache is a simple dictionary of customers by ID.

Using this method, each repository can have the data we expect to be present loaded into it before the request is made. It’s far more efficient to load these thousands of records in one go rather than making thousands of individual queries.

In our example, we are retrieving addresses and bank details by the ID’s retrieved on the Customer objects. To support this, we need to retrieve the bank detail ID’s and address ID’s from the cache of Customers before loading those caches. Then all subsequent logic will run, but pretty blindingly fast seeing as it’s only accessing memory and not having to make calls to the database.

Summing Up

The strategy for the fastest response, is probably to hit the database with one big query, but there are down sides to doing this. Specifically we don’t want to have lots of logic in a SQL query, and we’d like to re-use the code we’ve already written and tested for building individual records.

Loading all the ID’s from the database and iterating through the existing code one record at a time would work fine for small result sets where performance isn’t an issue, but if we’re expecting thousands of records and we want it to run in a few minutes then it’s not enough.

Caching the data using a few SQL queries is far more efficient and means we can re-use any logic easily. Even most of the SQL is refactored out of the existing queries.

Running things asynchronously will speed things up even more. If you’re careful with your use of database connections, then the largest improvement can probably be found by running the queries in parallel, as these will probably be your longest running processes.


Don’t Stream JSON Data

I recently published a post about how to stream large JSON payloads from a webservice using a chunked response, before reading this post it’s probably best to read that post here. Streaming is a fantastic method of sending large amounts of data with only a small memory overhead on the server, but for JSON data there could well be a better way. First of all, let’s think about the strategy for generating the JSON stream which was discussed in the earlier post:

  1. Query the ID’s of all records that should be included in the response.
  2. Loop through the ID’s and use the code which backs the ‘get by ID’ endpoint to generate each JSON payload for the stream.
  3. Return each JSON payload one at a time by yielding from an enumerator.

Seems straight forward enough, and I’ve seen a service sit there for nearly two hours ticking away returning objects. But is that really a good thing?

Let’s list some of the things which might go wrong.

  1. Network interruption.
  2. In these days of Dev Ops, someone re-deploying the service while it’s ‘mid stream’.
  3. A crash caused by a different endpoint, triggering a service restart.
  4. An exception in the consuming app killing the instance of whatever client is processing the stream.

These things might not seem too likely but the longer it takes to send the full stream, the greater the chance that something like this will happen.

Given a failure in a 2 hour response, how does the client recover? It isn’t going to be a quick recovery, that’s for sure. Even if the client keeps a track of each payload in the stream, in order to get back to where the problem occurred and continue processing, the client has to sit through the entire response all over again!

Another Way

Remember that nifty bit of processing our streaming service is doing in order to loop through all the records it needs to send? If we move that to the consumer, then it can request each record in whatever order it likes, as many times as it needs to.

  1. The consumer requests the collection of ID’s for all records it needs in a single request.
  2. The consumer saves this collection, so even if it gets turned off, re-deployed, or if anything else happens, it still knows what it needs to do.
  3. The consumer requests each record from the service by ID, keeping track of which records it has already processed.

This strategy actually doesn’t require any more code than streaming, in fact given that you don’t have to setup a handler to set the chunked encoding property on the response, it’s actually less code. Not only that, but because there will now be many discrete requests made via the HTTP protocol, these can be load balanced and shared between as many instances of our service as is necessary. The consumer could even spin up multiple processes and make parallel requests, it could get through the data transfer quicker than if it had to sit there and accept each record one at a time from a single instance.

We can even go one step further. Our client is going to have to retrieve a stack of ID’s which it will use to request each record in turn. Well it’s not that difficult to give it not just a collection of ID’s but a collection of URL’s. It’s not a step that everyone wants to take, but it has a certain cleanliness to it.

And So

If you’re faced with a requirement for a large JSON response, unless you’re running your applications on the most stable tech stack in the world and you’re writing the most reliable, bug free code in the world, then you could probably build something much better by throwing out the idea of returning a single huge response, even if streamed, in favour of multiple single requests for each record individually.

Making Decisions with Cynefin

A friend tweeted recently about how it isn’t always possible to decide late on which product to use for data storage as different products often force an application to use different patterns. This got me thinking about making other decisions in software design. In general it’s accepted that deciding as late as possible is usually a good thing but I think people often miss-interpret ‘as late as possible’ to mean ‘make it an after-thought’.

‘As late as possible’ is a wonderfully subjective term. It suggests that although we want to wait longer before make a decision, we might not be able to. We might need to make a decision to support the design of the rest of the system. Or perhaps in some cases, making the decision early might be more important than making the ‘perfect’ decision.

I started thinking about how to decide whether a decision should be put off and I was reminded of the work of Roy Osherove. He suggests that development teams transition through different states and should be managed differently in each state. There is a similar methodology which relates the approach to software design with different categories of problem space. It’s called Cynefin (pronounced like ‘Kevin’ but with a ‘n’ straight after the ‘K’).

To quote Wikipedia:

The framework provides a typology of contexts that guides what sort of explanations or solutions might apply.

There’s a diagram that goes with this and helps give some context (thanks Wikipedia):


I don’t want to do a deep dive into Cynefin in this article (maybe another day) but to summarise:

  • Obvious – these solutions are easy to see, easy to build, and probably available off the peg. They’re very low complexity and shouldn’t require a lot of attention from subject matter experts.
  • Complicated – these solutions are easier to get wrong. Maybe no-one on the team has implemented anything similar before, but experience and direction is available possibly from another team.
  • Complex – these solutions have lots of possible answers but it’s unclear which will be best. While this is new to your company, other companies have managed to build something similar so you know it is possible.
  • Chaotic – these solutions are totally new and you have no point of reference as to what would be the best way to implement. You may not even know if it’s possible.

In general, an enterprises core domain will sit in both Complicated and Complex. Chaotic might be something you do for a while but the focus is to move the solution back into one of the other categories.

So what does this have to do with making decisions?

Well, I suggest that the decision making process might change depending on what category your solution falls into.

  • Obvious – obvious solutions probably don’t have many difficult decisions to make. This is not your core domain (unless you work in a really boring sector) so throwing money and resources at obvious solutions is not sensible. The driver for choosing tech stack might well be just “what’s already available” and might be a constraint right up front. You may want to buy an off the shelf product, in which case a lot of decisions are already made for you. If SQL Server is the quickest path to delivery then it might well be the right thing to use here, even if you’re longing to try a NoSQL approach.
  • Complicated – complicated solutions are often solved by taking advice from an expert. “We found a separate read concern solved half our problems.” and “A relational database just doesn’t work for this.” are both great nuggets of advice someone who’s done this before might put forward. These solutions are in your core domain, you want to avoid code rot and inflexible architectures – deciding late seems generally sensible, but the advice from your experts can help scope those decisions. Focus on finding the abstractions on which to base the solution. You might know that you’ll need the elasticity that only the cloud can provide, but you might leave the decision on which provider until as late as possible.
  • Complex – complex solutions are where experts are harder to get involved. They might be in different teams or hired from a consultancy. The focus should still be on finding the right abstractions to allow critical decisions to be delayed. Running multiple possible solutions in parallel to see what works best is a great approach which will give your team confidence in the chosen option. A subject matter expert might be more useful in explaining how they approached defining a solution rather than just what the solution was.
  • Chaotic – it might seem a terrible idea to try and make decisions in this situation but there are advantages. Chaotic can become Complex if you can find an anchor for the solution. “How do we solve this with ‘x’?” is a lot easier for a team to decide than a more general approach. You’ll almost certainly want to run with two or three possible options in parallel. Keep in mind that whatever decision you make may well eventually be proved incorrect.

I think this shows how the approach to decision making can be affected by what category of solution you’re working on. By picking the right strategy for the right kind of problem, you can focus resources more cost effectively.


My last two clients have had completely contrasting views on PaaS, specifically on whether it should be used at all. Both clients deploy to AWS and Azure. Both want to embrace software volatility. Neither want to introduce unnecessary complexity. Both have a similarly scaled online offering where traffic is subject to peaks and troughs which aren’t always predictable.

With such similar goals and problems to solve I’m intrigued by how different their approaches have been. Admittedly one client has a much more mature relationship with the cloud where the other is jumping in with both feet but still not sure how to swim. Perhaps that’s the crux of the matter and both will eventually become more similar in their approaches.

For this article I want to focus on the perceived issues with PaaS and try to explain why I think many concerns are unfounded.

The Concerns

My current client has raised a number of concerns about PaaS and I’ve dug around on the internet to find what has been worrying other people. Here’s a list of the most popular concerns I’ve seen.

  • Vendor lock in – the fear that if software makes use of PaaS from one cloud provider, it will be too difficult to move to a different provider in future.
  • Compliance – the fear of audit.
  • B.A.U. – the fear of managing a PaaS based solution after the developers have left the building.
  • Lack of published SLAs – the fear that a platform may not be as reliable as you need.
  • Confusing marketing message – the fear of relying on something that isn’t defined the same way by two different providers anywhere.
  • Lack of standard approach – the fear of ending up with software tightly coupled to a single platform.

This is certainly not an exhaustive list but I think it covers all the popular problems and those concerns raised directly to me from my clients. So now let’s try to address things.

Vendor Lock In

This sounds very scary. The idea that once we start allowing our software to make use of the APIs and services provided by one cloud provider, we’ll be unable to move to a different provider next year.

First of all, let’s talk about what’s driving this footloose requirement. At some level in the business, someone or some people are unsure that the currently chosen cloud provider is going to remain so. They may even want to review how suitable they are on an annual basis and have reserved the right to change their minds at that point. This isn’t unusual and it could be the right thing to do – any company that blindly continues to use the same vendors and service providers without questioning if they still offer the right solution is destined to find themselves hindered by a provider who can no longer meet the business needs. So for example, let’s assume that there is a distinct possibility that although AWS is the flavour of the month, this time next year might see a shift to Microsoft Azure.

At the point of that shift to Azure, what is the expectation for existing systems? There has been a year of development effort pushing software into AWS, does the business think that it can be re-deployed into Azure ‘as is’? I would expect that there would be a plan for a period of transition. I would also expect that it would be recognised that there are some systems for which it isn’t worth spending the money to move. New development will undoubtedly happen in Azure with as little effort as possible focused on AWS. The business doesn’t expect a ‘big bang’ change (which would be incredibly high risk).

Now let’s think about how well your software currently running in AWS will run in Azure. Both AWS and Azure offer hosting with the same Operating Systems, so we’re off to a good start – you should at least be able to manually deploy and get something running. The catch is in the way that the virtual environments are delivered. If your app relies on local HD storage, then moving from AWS to Azure may mean quite a hit. At the time of writing this article, the best throughput you can get from Azure Premium storage is 200MB/s whereas AWS’ EBS Provisioned Volumes will give you a throughput of 320MB/s. So moving to Azure could impact your application’s performance under load, especially if it relies on a self managed instance of a database (Mongo DB for example). In fact, if you want high performance storage in Azure then Table Storage or DocumentDB are probably the best options – both of which are PaaS.

This is only one example of how moving cloud provider could impact your software, there are others. The virtual machine options are different – not just in hard disc size but in available memory, processor speeds and in how their performance changes with load. So what you’re deploying quite happily onto half a dozen instances with one cloud provider may require nine or ten instances on another, plus a few tweaks to how the software stores its data.

What I’m trying to highlight here isn’t that using PaaS won’t be a barrier to moving from one cloud provider to another, rather that it isn’t the one you would have to worry about. Changing the API that is used for caching data is a well defined problem with easily understood steps to implement. Understanding the impact of the subtle differences in how each cloud provider delivers your virtual environments – that’s hard.

That’s not the end of this issue. Lets look at this from the software side. How often do developers use 3rd party software? From my own experience, I don’t think I remember the last time I spent a day writing code which didn’t involve several NuGet Install-Package statements. Every time I’m always careful to prevent tight coupling between my code and the installed packages. Why wouldn’t I expect the same care to be taken when working with PaaS? It’s really good practice to write a client for your PaaS interaction that abstracts the detail of implementation away from the logical flow of your software. This is good programming 101. When moving to another cloud provider the impact of changing the API is predominantly limited to the client. By far not the biggest problem you’ll have to deal with.


Depending on what your business does, you may have restrictions on where you can store your data. Conversely, storing your data in some territories may incur restrictions on how that data must be encrypted. Some territories may just not allow certain types of data to be stored at all; or you may need to be certified in some way and prove correct storage policies by external audit.

These rules don’t change if you store your data in a traditional data-center. You still have to be aware of where your data is going and what that means. It isn’t just a cloud provider that might make use of geolocation for resilience purposes. So your problem exists either way.

Cloud providers are aware of this issue and are very clear on where their data is stored and what control you have over this. This is specifically for compliance reasons.


Once a system is in place and running, the developers are rarely interested in maintaining it from day to day. That job usually falls to a combination of Operations and Dev Ops. The concern with PaaS is that it will in some way be harder for a non-development team to manage than if something well known and self managed is used. I think this falls into the category of ‘fear of the unknown’ – the question I would ask is “will a service that is managed for you be harder to look after than something that you have to fully manage yourself?” Even if you have a dedicated team with a lot of expertise in managing a particular technology, they will still have to do the work to manage it. PaaS usually is configured and then left. With nothing else to do than respond to any alerts which might suggest a need to provision more resources. It’s made resilient and highly available by clicking those buttons during configuration or setting those values in an automation script.

Perhaps there is a concern that in future, it will be harder to find development resource to make changes. This is a baseless fear. No-one debates this problem when referencing 3rd party libraries via NuGet – there really isn’t any difference. Sure there may be some more subtle behaviours of a PaaS service which a developer may not be aware of but any problems should be caught by testing. Often the documentation for PaaS services is pretty good and quite to the point; I’d expect any developer working with a PaaS service to spend as much time in their documentation as they would for any 3rd party library they used.

Take a look at the AWS docs for DynamoDB – the behaviour of the database when spikes take reads or writes beyond what has been provisioned is a pretty big gotcha, but it’s described really well and is pretty obvious just from a quick read through.

There is definitely going to be a need to get some monitoring in place but that is true for the whole system anyway. When establishing the monitoring and alerts, there will have to be some decisions made around what changes are worthy of monitoring and what warrant alerts. Thinking of the utilised PaaS as just something else pushing monitoring events is a pretty good way to make sure the right people will know well in advance if any problems are going to be encountered.

Lack of Published SLAs

This can cause some worries and it’s something that annoys me a lot about AWS in particular. I don’t see any reason why an SLA wouldn’t be published – people want to know what they’re buying and that’s an important part of it. But let’s get our sensible heads on – we’re talking pretty damned decent up times even if it isn’t always 99.999%.

In my opinion, worrying about the SLA for a PaaS service provided by people such as Amazon, Microsoft or Google doesn’t always make much sense. These guys have massive resources behind them – you’re far more likely to mess it up than they are. But let’s think about how failures in any service should be handled. There should always be a failure state which defaults to something which at least isn’t broken, otherwise your SLA is tied to a multiple of the SLAs of every 3rd party. Your system has to be resilient to outages of services you rely on. Also, let’s remember where you system is hosted – in the same data centre as the PaaS service is running in. If there is an outage of the PaaS service, it could be also impacting your own system. Leveraging the flexibility of geolocation and availability zones allows you to get around those kinds of outages. I’m not saying you’re guaranteed constant availability but how often have you seen go down?

Given the nature of cloud hosting coupled with a resilient approach to calling 3rd party services, a lack of published SLA isn’t as terrifying as it seems. Code for outages and do some research about what problems have occurred in the past for any given service.

Confusing Marketing Message

This is an interesting one. What is PaaS? Where does infrastructure end and platform begin? That might be pretty easy to answer in a world of traditional data-centers, but in the cloud things are a bit more fluffy. Take Autoscaling Groups, for example, or more specifically the ability to automatically scale the number of instances of your application horizontally across new instances based on some measure. I’ve heard this described as IaaS, PaaS and once as ‘IaaS plus’.

The line between IaaS and PaaS is being continuously blurred by cloud providers who I don’t think are particularly worried about the strict categorisation of the services they provide. With services themselves consisting of several elements, some of which might or might not fall neatly into either PaaS or IaaS, the result is neither.

I think this categorisation is causing an amount of analysis paralysis among some people who feel the need for services to be pigeon holed in some way. Perhaps being able to add a service into a nice, pre-defined category makes it somehow less arduous to decide whether it’s something that could be useful. “Oh, IaaS – yeah, we like that! Use it everywhere.” Such categorisations give comfort with an ivory tower, fully top down approach but don’t change the fundamental usefulness of any given service.

This feels a little 1990’s to me. Architecture is moving on and people are becoming more comfortable with the idea of transferring responsibility for the problematic bits to our cloud provider’s solution. We don’t have to do everything for ourselves to have confidence that it’s as good as it could be – in fact that idea is being turned on its head.

I love the phrase “do the hard things often”, well no-one does any of this as often as the people who provide your cloud infrastructure. Way more often than you do and they’re far better at it, which is fine – your company isn’t a cloud provider, it’s good at something else.

So should we worry that a service might or might not be neatly described as either PaaS or IaaS? I think it would be far more sensible to ask the question “is it useful?” or even “how much risk is being removed from our architecture by using it?” and that isn’t going anywhere near the cost savings involved.

Lack of Standard Aproach

In my mind, this could be a problem as it does seem to push toward vendor lock in. But, let’s consider the differing standards across cloud providers – where are they the same? The different mechanisms for providing hard disks for VM’s results in Amazon being half as fast again as Azure’s best offering. What about the available VM types? I’m not sure there is much correlation. What about auto-scaling mechanisms? Now they are definitely completely different. Code deployment services? Definitely not the same.

I suppose what I’m trying to get at is that each cloud provider has come up with their own service which does things in its own specific way. Not surprising really. We don’t complain when an Android device doesn’t have a Windows style Start button, why would we expect two huge feats of engineering which are cloud services to obey the same rules? They were created by different people with different ideas and to initially solve different problems.

So there is a lack of standards, but this doesn’t just impact PaaS. If this is a good reason to fear PaaS then it must be a good reason to fear the cloud altogether. I think we’ve found the 1990’s again.

Round Up

I’m not in any way trying to say that PaaS is some kind of silver bullet, or that it is inherently always less risky than a self managed solution. What I am trying to make clear is that much of the fear around PaaS is from a lack of understanding. The further away an individual is from dealing with the different implementations (writing the code), the harder it is to see the truth of the detail. We’ve had decades of indoctrination telling us that physical architecture forms a massive barrier to change but the cloud and associated technologies (such as Dev Ops) removes that barrier. We don’t have less points of contact with external systems, we actually have more, but each of those points is far more easily changed than was once true.

Some Useful Links


Going Deep Enough with Microservices

Moving from a monolith architecture to microservices is a widely debated process, with many recommendations and nuggets of advice available on the web in blogs like this. There are so many different opinions out there mainly because where an enterprise finds their main complexities lay depends on the skillsets of their technologists, the domain knowledge within the business and the existing code base. During the years I’ve spent as a contractor in a very wide range of enterprises, I’ve seen lots of monolith architectures – all of them causing slightly different headaches because those responsible for developing them let different aspects of the architecture slip. After all, the thing that is often forgotten is that if a monolith is maintained well, then it can work. The reverse is also true – if a microservice architecture is left to evolve on its own, it can cause as many problems as a poorly maintained monolith.


One popular way to break things down is using Domain Driven Design. Two books which cover most concepts involved in this process are ‘Building Microservices’ by Sam Newton ( and ‘Implementing Domain Driven Design’ by Vaughn Vernon ( which largely references ‘Domain Driven Design: Tackling Complexity in the Heart of Software’ by Eric Evans ( I recommend Vaughn’s book over Evans’ as the latter is a little dry.

If you take on board even just half the content covered in these books, you’ll be on a reasonable footing to get started. You’ll make mistakes but as Sam Newton points out (and I’ve seen for myself) that’s inevitable.

Something that seems to be left out of a lot of domain driven discussions is what happens beyond the basic CRUD processes and domain logic in the application layer. Attention sits primarily with the thin interaction between a web interface and the domain processing by the aggregate in question. When dismantelling a monolith architecture into microservices, focus on just the application layer can give the impression of fast progress but in reality half the picture is missing. It’s likely that in a few months there will be several microservices but instead of them operating solely in their sub-domains, they’ll still be tied to the database that the original monolith was using.


It’s hugely important to pull the domain data out of the monolith store. This is for the very same reasons we segregate service responsibilities into sub-domains. Data pertaining to a given domain may exist in other domains as well but changes will not necessarily be subjected to the same domain rules and individual records may have different properties. There may be a User record in several sub-domains, each with a Username property but the logic around how duplicate Usernames are prevented should sit firmly in a single sub-domain. If a service in a different sub-domain needs to update the username, it should either call a public service from the Profile sub-domain or raise a ‘Username Updated’ event that the Profile sub-domain would handle, process and possibly respond with a ‘Username Update Failed’ event of its own.

This example may be a little contrived – checking for duplicates could be something that’s implemented everywhere it’s needed. But consider what would happen if it became necessary to check for duplicates within another external system every time a Username is updated. That logic could easily be encapsulated behind the call to the Profile service but having to update every service that updates Usernames wouldn’t be good practice.

So if we are now happy that the same data represented in different sub-domains could at any one time be different (given the previous two paragraphs) then we shouldn’t store the data for both sub-domains in the same table.

Local Data

In fact, we’re now pretty well removed from needing a classic relational database for storing data that’s local to the sub-domain. We’re dealing with data that is limited in scope and is intended for use solely by the microservices built to sit in that sub-domain. NoSQL databases are ideal for this scenario and no matter which platform you’ve chosen to build on there are excellent options available. One piece of advice I think is pretty sound is that if you are working in the cloud, you’ll usually get the best performance by using the data services provided by your cloud provider. Make sure you do your homework, though – some have idiosyncracies that can impact performance if you don’t know about them.

So now we have data stored locally to the sub-domain, but this isn’t where the work stops. It’s likely there’s a team of DBA’s jumping around wondering why their data warehouse isn’t getting any new data.

The problem is that the relational database backing the monolith wasn’t just acting as a data-store for the application. There were processes feeding other data-stores for things like customer reporting, machine learning platforms and BI warehouses. In fact, anything that requires a historical view of things will be reading it from one or more stores that are loaded incrementally from the monolith’s relational database. Now data is being stored in a manner best suiting any given sub-domain, there isn’t a central source for that data to be pulled from into these downstream stores.

Shift of Responsibility

Try asking a team of dba’s if they fancy writing CLR based stored procedures to detect changes and pull new records into their warehouse by querying whatever data-store technologies have been decided on in each case – I doubt they’ll be too receptive. The responsibility for getting data out of each local data-store now has to move closer to the application services.

The data guys are interested in recording historical and aggregated records, which is convenient as there is a useful well known tool for informing different systems that something has happened – an event.

It’s been argued that using events to communicate across sub-domains is miss-using an event stream as a message bus. My argument in this case is that the back-end historical data-store is still within the original sub-domain. The data being stored belongs specifically to that sub-domain and still holds the same context as when it was saved. There has been a transition to a new medium of storage but that’s all.

So we are now free to raise events from our application microservices into eventstreams which are then handled by a service specifically designed to transfer data from events into whatever downstream stores were originally being fed from the monolith database. This gives us full extraction from the monolithic architecture and breaks the sub-domain’s dependency on the monolith database.

There is also the possibility that we can now give more fine grained detail of changes than was being recorded previously.

Gaps in the Monolith Database

Of course back end data-stores aren’t the only consumers of the sub-domain’s data. Most likely there will be other application level queries that used to read the data you’re now saving outside of the monolith database. How you manage these dependencies will depend on whether the read requests are coming from the same sub-domain or another. If they’re from the same sub-domain then it’s equally correct to either pull the data from an event stream or from microservices within that sub-domain. Gradually, a sub-domain’s dependency on the monolith database will die. If the queries are coming from a different sub-domain then it’s better to continue to update the monolith database as a consumer of the data stored locally to the sub-domain. The original table no longer containing data that is relevant to the sub-domain you’re working on.


Obviously we don’t want to have any gaps in the data being sent to our back-end stores, so as we pull functionality into microservices and add new data-stores local to the sub-domain, we also need to build the pipeline for our new back end processing of domain events into the warehouse. As this gets switched on, the loading processes from the original monolith can be switched off.

External Keys

Very few enterprise systems function in isolation. Most businesses make use of off-the-shelf packages or cloud based services such as Salesforce. Mapping records into these systems usually means using the primary key of each record to create a reference. If this has happened then the primary key from the monolith is most likely being relied on to hold things together. Moving away from the monolith database means the primary key generation has probably been lost.

There are two options here and I’d suggest going with whatever is the easiest – they both have their merits and problems.

  1. Continue to generate unique id’s in the same way as the monolith database did and continue to use these id’s for reference across different systems. Don’t rely on the monolith for id generation here, create a new process in the microservice that continues the same pattern.
  2. Start generating a new version of id generation and copy the new keys out to the external systems for reference. The original keys can eventually be lost.

Deeper than Expected

When planning the transition from monolithic architecture to microservices, there may well be promises from the management team that time will be given to build each sub-domain out properly. Don’t take this at face value – Product Managers will still have their roadmaps to fulfill and unfortunately there is maybe only 30% of any given slice of functionality being pulled out of a monolith that an end user will ever see. Expect the process to be difficult no matter what promises are made.

What I really want to get across here is that extracting even a small amount of functionality into microservices carries with it a much deeper dive into the enterprise’s tech stack than just creating a couple of application services. It requires time and focus from more than just the Dev team and before it can even be started, there has to be a architectural plan spanning the full vertical slice of a sub-domain, from front end to warehoused historical data.

Consequences of Not Going Deep Enough

How difficult do you find it in your organisation to get approval for technical upgrade work, or for dealing with technical debt as a project (which I’m not advocating is a good strategy), or for doing anything which doesn’t have a directly measurable positive impact on new product? In my experience, it isn’t easy and I’m not sure it should be, but that’s for another post.

Imagine you’ve managed to extract maybe 70% of your application layer away from your monolith but you’re still tied to the same data model. Have you achieved what you set out to do? You certainly don’t have loose coupling because everything is tied at the data level. You don’t have domain isolation. You are preventing your data team from getting access to the juicy new events you don’t really need to be raising (because the changed data is already available everywhere). You’ve turned a monolith into an abomination – it isn’t really microservices and it isn’t a classic monolith, it isn’t really any desired pattern at all. Even worse, the work you are missing is pretty big and may not directly carry with it any new features. Will you get agreement to remove coupling with the database as a project itself?

How are your developers doing? How many of them see that the strategy is only going half way? How many are moaning about paying lip service to the architecture? Wasn’t that one of the reasons you started with microservices in the first place?

Can you deploy the microservices without affecting other sub-domains? What if there are schema changes? What if there are schema changes in 2 sub-domains and one needs to be rolled back after release because it wasn’t quite right? Wasn’t this something microservices was supposed to prevent?

How many dodgy hacks or ‘surprises’ are there in your new code where devs have managed to make domain isolated services work with a single relational data model? How many devs waste time hand wrangling when they know they’re building something that is going to be technical debt the moment it goes live?

Ok, so I’m painting a darker picture than you’ll probably feel, but each of these scenarios will almost certainly come up, you just might not get to hear about it.

The crux for me is thinking about the reasons for pursuing a microservice architecture. The flexibility, loose coupling, technology agnosticity (if that’s a real term), the speed of continuous delivery that you’re looking for. Unless you go deeper than the low lying fruit of the application layer, you’ll be cheating yourself out of these benefits. Sure, you’ll see improvements short term but you are building something which is already technical debt. No matter what architecture you choose, if you don’t invest in maintaining it properly (or even building it properly in the first place) then it will ultimately become your albatross.

Events vs Commands

In the world of service oriented architectures and CQRS style processes there is a tendancy for nearly everything to raise events. Going back a few years however, before REST became fashionable many interactions were by RPC and often the result of processing commands from a queue.

So when did commands become an anti-pattern? Well of course, they never did. These days we just have to understand when it’s more appropriate to send a command or raise an event.

Here’s a table to help you decide what you should be using:

Events Commands
An event is all about something that has already happened A command is all about something that the originating service wants to happen (although it might not be successful)
A service raising an event doesn’t care what happens to it. Something consuming an event is not critical to the service’s function. A service sending a command needs that command to be processed as part of it’s functionality.
An event could be consumed by one, many or no consumers. A command is intended for one specific consumer.
An event can suggest loose coupling between services. A command definitely indicates tight coupling – the originating service knows about the command target.
A service prevented from raising an event can only report that the event was not raised. A service prevented from sending a command can report the failure to a team with specific domain knowledge about what will happen down stream if the command is not processed. The service may be designed to fail its own process if the command fails.

A really good example of the right use of an event is communicating between services within a bounded context that something has happened. The originating service will have successfully completed its function before raising the event. Consumers of the event do something else in addition that the originating service doesn’t really care about.

A good example of the right use of a command is where two different platforms need to be kept in sync with each other. When data is updated in one system a sync command is sent to update the other. If something stops that command getting sent (e.g. an auth issue between the service and a message queue) then the service can react and alert people to the issue, or it may be that the update in the originating service needs to fail.

Both events and commands are important in a distributed system. Using them in the right places makes your intent much clearer and helps keep your system structured.

When Things Just Work

A particularly tricky epic hits the development team’s radar. The Product Manager has been mulling it over for a while, has a few use cases from end users and has scoped things pretty well. He’s had a session with the Product Owner who has since fleshed things out into an initial set of high priority stories from a functional point of view. The Product Owner spends an hour with the Technical Lead and another Developer going over the epic. It quickly becomes apparent that they’re going to have to build some new infrastructure and a new deployment pipeline so they grab one of the Architects to make sure their plans are in line with the technical roadmap. Some technical stories are generated and some new possibilities crop up. The Product Owner takes these to the Product Manager. Between them they agree what is expected functionally and re-prioritise now they have a clearer picture of what’s being built. In the next grooming session the wider team hit the epic hard. There’s a lot of discussion and the Architect gets called back in when there are some disagreements. The QA has already spent some time considering how testing will work and she’s added acceptance criteria to all the stories which are reviewed by the devs during the meeting. Each story is scored and only when everyone is happy that there is a test approach, a deployment plan and no unresolved dependencies, does a story get queued for planning. It takes three sprints to get the epic completed, during which the team focus primarily on that epic. They pair program and work closely with the QA and Product Owner who often seem like walking, talking specifications. There are a few surprises along the way and some work takes longer than expected but everyone’s ok with that. Some work gets completed quicker than expected. The Product Manager has been promoting the new feature so when it finally goes fully live the team get some immediate feedback on what users like and dislike. This triggers a few new stories to make some changes that go out in the next release.

Well, that all sounded really simple. Everyone involved did their bit, no-one expected anything unreasonable, the only real pressure came from the team’s own pride in their work and everyone went home on an evening happy. So why is it that so often this isn’t the case? Why do some teams or even entire companies expend huge amounts of effort but only succeed in increasing everyone’s stress levels, depriving people of a decent night’s sleep and releasing product with more bugs than MI5?

What Worked?

If you’re reading this then hopefully team management is something you’re interested in and you’re probably well aware that there is never just one thing that makes it work. In fact, Roy Osherove described three distinctly different states that a team can find itself in which require very different management styles and expose very different team dynamics.

If you haven’t already done so, take a look at Roy’s blog here: and download his book here:

Often team members are affected by influences external to the team – for example, the Product Manager is almost certain to be dealing with multiple teams and end users. Their day could have started pretty badly and that will always colour interactions with others – we’re only human. So at any one given moment of interaction, the team could be in one of many states that could either be conducive to a good outcome or that could be leading to a problem.

Let’s try pulling apart this idealistic scenario and see how many opportunities there were for derailment.

Early Attention

In our scenario, the epic isn’t straight forward so the Product Manager has been thinking about what it means to his end users and where it fits into his overall plan. A lesser Product Manager might not have given it any attention yet and instead of having some stories outlined there might not be much more than a one line description of the functionality. Lack of care when defining work speaks to the rest of the team about how little the epic matters. If the Product Manager doesn’t care enough about the work to put effort into defining it properly, then others will often care just as little.

Early Architectural Input

Right at the beginning the Architect is questioned about how they see the work fitting in with the rest of the enterprise. Without talking tech early the team could waste time pursuing an approach which is at best not preferred and possibly just won’t work.

Product Owner and Technical Lead

The Product Owner and the Technical Lead take on the initial task of trying to get to grips with the story together. These two are a perfect balance of product and development. Moving up the seniority tree, the Product Manager and the Architect can often disagree vehemently, but the less senior pair need the relationship of mutual trust they’ve built. Nowhere is there a better meeting of the two disciplines. Lose either of these roles and the team will suffer.

Changing Things

After looking at things from a technical point of view, the Product Owner goes back to the Product Manager to discuss some changes. If the Product Owner isn’t open to this then many opportunities for quick wins will be missed and it becomes much more likely that the team will be opening at least one can of worms that they’d prefer not to.

Grooming Wide

Although strictly speaking all the work that goes into defining a story is ‘grooming’ and doesn’t have to include the whole team, there should be a point (probably in a grooming session) where the majority of the team gets the chance to review things and give their own opinions. If this doesn’t happen then much of the team’s expertise is being wasted. Also, some team members will be specialists in certain areas and may see things that haven’t yet been considered. Lastly (and maybe most importantly) if the team are simply spoon fed a spec and expected to build it, they aren’t being engaged in the work – they are far more likely to care less about what is being built.

Ready for Planning

The team make sure that the stories have test approaches, reams of acceptance criteria, no unresolved dependencies and that everyone believes the stories are clear enough to work on them. This is a prerequisite to allowing the work to be planned into a sprint. Without this gate to progression, work can be started that just can’t be finished. Working on something that can’t yet be delivered is a waste of time.

Questions with Answers

During the sprints, the Product Owner and QA are on hand all the time for direct face to face discussions about the work. “Does this look right?” “Can we reduce the number of columns on the screen?” “Is this loading quick enough?” – developers are at the sharp end and need to have answers right there and then so they can build. Without Product and QA representation available, a simple question like these can stall a story for an hour, a day or maybe too long to complete in the sprint.

And More

This is one small, quite contrived scenario. In real life things are rarely as straightforward but with every interaction and every piece of work within the team and beyond there is the risk for some kind of derailment. To list every scenario would take forever, so how can problems be avoided?

Knowing Their Jobs

Each individual in this team knew where they fitted into the puzzle. Everyone worked together as a team. In fact things need to go a little further than that for the best outcome – team members should know what each other’s jobs entail. Take the Technical Lead role; they should be protecting their team from ill conceived ideas (even when there seems to have been a lot of effort gone into those ideas). It’s part of their job to question whether proceeding with a given piece of work is the best thing to do. When they do raise concerns, these should be listened to and discussed with a level of maturity and mutual respect that befits someone who fulfilling one of the requirements of their job. Equally, when a QA suggests that a feature seems flawed even though it passes acceptance criteria, their issue should be addressed, even if nothing is ultimately changed. This is part of the QA’s job – raising issues of quality.

I’d like to outline what I see as an excellent balance of team members and responsibilities. I don’t mean to insinuate that this is the only way a team can work – in fact I encourage every team to find their own way. Regular retrospectives where the team look at how they’re performing, what works and what doesn’t, allow the team to form their own processes. This is nearly always preferable to mandating how things should be done as people are more likely to stick to their own ideas.

That having been said, I believe the lines are there to blur and if I were to define roles around those lines, it would be like this:

Product Manager

The Product Manager is responsible for the product road map. They are focussed on what the end users are missing in the product as it stands and they are working to define feature sets that will better meet the end user’s requirements and better any competition. It’s a trade off between effort, risk, how badly the feature is wanted, when it is technically feasible to implement it.

It is vitally important that their roadmap is fully understood initially by the Product Owner and then by the senior members of the team. Decisions are constantly being made during delivery which benefit from a knowledge of product focus.

The Product Manager is not responsible for writing stories or working with the development team beyond receiving demo’s of completed or near completed work. They’re not responsible for the velocity of the team or for managing the team in any way, although they do have a vested interest in when things are going to get delivered and will almost certainly require an explanation if things can’t or don’t happen as quickly as they’d like.

Product Owner

The Product Owner can best be thought of as an agile Business Analyst. They are responsible for communicating the vision of the Product Manager to the team. They sit within the team and are answerable to the Team Lead rather than the Product Manager. This might seem odd but the team needs them to put their need to understand the product ahead of anything else. Without good understanding of what is being built, the team cannot hope to build the right thing.

The Product Owner wants their stories to be understood, so they will want the team to be involved in deciding what information is recorded in a story and how it is formatted. Story design will usually evolve over a few sprints from discussions during retrospectives. It is the Product Owner’s sole responsibility to make sure that all functional details are described once and only once and that none are left out. They will often work closely with the team’s QA to make sure all functional aspects have acceptance criteria before the story is presented in a grooming session. At a minimum, before a story is allowed to be planned into a sprint, all functional details must have an acceptance criteria to go with them. It is quite acceptable for these acceptance criteria to be considered the specification, rather than there being two different descriptions of the same thing. This reduces duplication, which is desirable because duplication leaves a story open to misunderstanding.


The QA should be focussing on automation as much as possible. The last thing a QA wants to do is spend their time testing – it isn’t a good use of their time. Yes, there are a few things that are still best done by opening the app and having a look and I’m not saying that automation should be solely relied on, but it should be the default approach. Recognising that something can’t be automated should be the exception rather than noticing that something could. Something automated never needs to be tested again until that thing changes. Something manual adds to the existing manual set of tests. Only so much of the latter can go on before the QA has no time to really think about the quality of the product.

The QA should be happy that they understand how they are going to ensure the quality of every single story picked up by the team. They should have recorded acceptance criteria which act as a gateway to ‘done’ for the developers and a reminder for the QA about different aspects of the story.

Sometimes there will be several different levels of criticality to functionality. For example, imagine a piece of work on a back office system which includes modifications to an integration to a customer facing system. It’s not the end of the world if the back office system needs a further tweak to make things absolutely perfect, but if the integration breaks, then end users may be affected and data may be lost. The integration should definitely be tested with automation – for every field and every defined piece of logic involved. The internal system’s ui, could probably be tested manually, depending on the circumstances.

The QA should make sure they spend time with the Product Owner looking at new stories and working quite abstractly to define acceptance criteria around the functionality. During grooming, technical considerations will kick up other acceptance criteria and a decision around what needs to be automated and what doesn’t can be made by the team. A decision which the QA should make sure is made actively and not just left to a default recipe.


The Architect is not really a part of a specific team. They will have their own relationship with the Product Manager and will be familiar with the product road map. They will have three main considerations on their mind:

  1. They will be looking at the direction that technology in general is going and will be fighting the rot that long lived systems suffer from. This isn’t because code breaks, quite the opposite – the problem is that working code could stay in place for years after anyone who had anything to do with writing it has left the company, even if that code is not particularly good. Replacing working systems is a difficult thing for businesses to see a need for but if the technology stack that is relied on isn’t continuously refreshed then it becomes ‘technical debt’. Something that no-one wants to touch and that is expensive to fix.
  2. They will be making sure the the development teams have enough information about the desired approach to system building. If following a DDD approach, the teams need to know what the chosen strategy is for getting data from one subdomain to another. They want to know what format they should be raising their domain events in, how they should be versioned. Given new ideas and technologies, they need to know when it’s ok to start using them and more importantly, when it’s too premature to start using them.
  3. In conjunction with the Product Roadmap, they will be defining the Technical Roadmap. This document takes into consideration what Product are wanting to do and what the technical teams are wanting to do. It’s almost a given that the Technical Roadmap will have to feedback into the Product Roadmap as it shows what needs to be done to deliver. For this reason, it’s generally a good idea not to consider a Product Roadmap complete until it has been adjusted to accommodate the technical plan.

In the scenario at the start of this post, the Architect was consulted because additional infrastructure was going to be needed. This is something that could happen several times a week and something that the Architect should be prepared for. They need to give definite answers for what should be done right now, not what that decision would be if the question was asked in a month’s time – this leads to premature adoption of new concepts and technologies that aren’t always fully understood or fully agreed on.


The Developers are at the sharp end. They should have two main focusses, and make no mistake, both are as important as each other:

  1. They need to build the Product.
  2. They need to work out how they can become better developers.

Notice how I haven’t mentioned anything about requirements gathering. This is something that has classically been considered what most developers overlook, but in a development team this is done by the Product Owner, Product Manager and to a lesser extent the QA. It’s this support net of walking talking specifications combined with well defined stories that allow a developer to focus on doing what they do best, writing code.

Something that can be ignored by less experienced developers is that the Product is not just the code that makes it up; it has to be delivered. If building a web application then this will ultimately have to deployed, most likely into at least one test environment and into production. Maybe this process is automated or is handed on to a ‘Dev Ops’ team, but it’s still down to the developers to build something that can actually be deployed. For example, if a piece of code that would normally talk to a database is changed so that it now only works with a new version of the database, how do we deploy? If we deploy the database first, then the old code won’t work when it calls it. If we deploy the code first then the call to the old database won’t work. There could be dozens of instances of the code, which could take hours to deploy to one by one – it isn’t usually possible to deploy data changes at exactly the same time as deploying code. Basically, the Developers have to keep bringing their heads out of the code to think about how their changes impact other systems.

Technical Lead

Technical Lead is a role that is very close to my heart as it’s a role I tend to find myself in. I believe that if a development team delivers a product, then a Technical Lead delivers the team. They are technically excellent but are split focussed – they are as much about growing the team as they are about making sure the team are delivering what is expected.

There is a lot of noise about a good Technical Lead being a ‘servant leader’ and for the most I think this is true. Still, there are times when it’s necessary to be direct, make a decision and tell people to do it. When to use each tactic is something only experience teaches but get it wrong and the team will lose focus and cohesion.

It’s quite common to find that a Technical Lead has one or more developers in their team that are better coders than themselves. It’s important for someone in the lead role to realise that this is not a slur on their abilities; the better coder is a resource to be used and a person to grow. They aren’t competition. This often happens when developers don’t want to split their focus away from the technology and onto the people. They become incredible developers and eventually often move into a more architectural role. Because of their seniority, they can also stand in for the Technical Lead when it’s needed. Which can allow a lead to occasionally focus on building something specific (it’s what we all enjoy doing, after all).

Ultimately it’s down to the Technical Lead to recognise what is missing or lacking and make sure that it is made available. They see what the team needs and makes sure the team gets it (even if sometimes that’s a kick up the arse).

Blurring the Lines

Not every team has enough people to fill all these roles. Not every company is mature enough to have the staff for each of these roles. So this is where the lines begin to blur.

It isn’t always critical to have each of these roles fulfilled as a full time resource. Each company and team’s specific situation needs to be considered on it’s own merits. What is critical is understanding why these roles exist. This knowledge allows the same individual to have different hats at different times. But beware of giving too much power to one person or to removing the lines of discussion which is what generates great ideas.

So What Actually Makes it Work?

In a highly technical environment I find my own opinion on this very surprising. I don’t think it’s any individual’s technical ability or how many different programming languages they can bring to bear. I don’t think it’s having every detail of every epic prepared completely before development work starts. Primarily, consideration must be given to the structure and dynamics of a team. the above is an excellent starting point although as I’ve already mentioned, it may be that multiple roles are taken by any one individual. Other than this, the I believe a successful team will have these qualities:

Patience, mutual respect and a healthy dose of pragmatism.

If there is one constant in every team I’ve ever worked with or in, it’s that some things will go wrong. Mistakes will happen – no-one is immune (and if they claim otherwise don’t believe them). Making progress forwards doesn’t mean always building the most incredibly perfect solution, it means getting product out the door which end users will be happy with. So if there are mistakes, have the patience to realise that it was inevitable. If you make the mistake, have the respect for the rest of your team to realise that they can help fix things – believe that they aren’t judging you by your mistake. Finally, remember that the perfect solution is not the perfect system – there’s more to delivering software than trying to find all the bugs.

A team with these values will ultimately always outperform a collection of technically excellent individuals.