Large JSON Responses

The long slog from a 15 year old legacy monolith system to an agile, microservice based system will almost inevitably include throwing some API’s in front of a big old database. Building a cleaner view of the domain allows for some cleaner lines to be drawn between concerns, each with their own service. But inside those services there’s usually a set of ridiculous SQL queries building the nice clean models being exposed. These ugly SQL queries add a bit of time to the responses, and can lead to a bit of complexity but this is the real world, often we can only do so much.

So there you are after a few months of work with a handful of services deployed to an ‘on premises’ production server. Most responses are pretty quick, never more than a couple of seconds. But now you’ve been asked to build a tool for generating an Excel document with several thousand rows in it. To get the data, a web app will make an http request to the on prem API. So far so good. But once you’ve written the API endpoint and requested some realistic datasets through it, you realise the response takes over half an hour. What’s more, while that response is being built, the API server runs pretty hot. If more than a couple of users request a new Excel doc at the same time then everything slows down.

Large responses from API calls are not always avoidable, but there are a couple of things we can do to lessen the impact they have on resources.

Chunked Response

Firstly, lets send the response a bit at a time. In .NET Web Api, this is pretty straight forward to implement. We start with a simple HttpMessageHandler:

public class ChunkedResponseHttpHandler : DelegatingHandler
    protected override async Task SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        var response = await base.SendAsync(request, cancellationToken);
        response.Headers.TransferEncodingChunked = true;
        return response;

Now we need to associate this handler with the controller action which is returning our large response. We can’t do that with attribute based routing, but we can be very specific with a routing template.

                new { controller = "Customers", action = "GetByBirthYear" },
                new { httpMethod = new HttpMethodConstraint(HttpMethod.Get)  },
                new ChunkedResponseHttpHandler(config));

In this example, the url ‘customers’ points specifically at CustomersController.GetByBirthYear() and will only accept GET requests. The handler is assigned as the last parameter passed to MapHttpRoute().

The slightly tricky part comes when writing the controller action. Returning everything in chunks won’t help if you wait until you’ve loaded the entire response into memory before sending it. Also, streaming results isn’t something that many database systems natively support. So you need to be a bit creative about how you get data for your response.

Let’s assume you’re querying a database, and that your endpoint is returning a collection of resources which you already have a pretty ugly SQL query for retrieving by id. The scenario is not as contrived as you might think. Dynamically modifying the ‘where’ clause of the ‘select by id’ query and making it return all the results you want would probably give the fastest response time. It’s a valid approach, but if you know you’re going to have a lot of results then you’re risking memory issues which can impact other processes, plus you’re likely to end up with some mashing of SQL strings to share the bulk of the select statement and add different predicates which isn’t easily testable. The approach I’m outlining here is best achieved by breaking the processing into two steps. First, query for the ID’s for the entities you’re going to return. Secondly, use your ‘select by ID’ code to retrieve them one at a time, returning them in an enumerator, rather than a fully realised collection type. Let’s have a look at what a repository method might look like for this.

public IEnumerator GetByBirthYear(int birthYear)
    IEnumerable customerIds = _customersRepository.GetIdsForBirthYear(birthYear);
    foreach (var id in customerIds)
        Customer customer;
            customer = Get(id);
        catch (Exception e)
            customer = new Customer
                Id = id,
                CannotRetrieveException = e
        yield return customer;

public Customer Get(int customerId)

The important things to notice here are:

  1. The first call is to retrieve the customer ID’s we’re interested in.
  2. Each customer is loaded from the same Get(int customerId) method that is used to return customers by ID.
  3. We don’t want to terminate the whole process just because one customer couldn’t be loaded. Equally, we need to do something to let the caller know there might be some missing data. In this example we simply return an empty customer record with the exception that was thrown while loading. You might not want to do this if your API is public, as you’re leaking internal details, but for this example let’s not worry.

The controller action which exposes this functionality might look a bit like this:

public IEnumerable GetByBirthYear(int birthYear)
    IEnumerator iterator;
    iterator = _customersServices.GetForBirthYear(birthYear);
    while (iterator.MoveNext())
        yield return iterator.Current;

Things to notice here are:

  1. There’s no attribute based routing in use here. Because we need to assign our HttpHandler to the action, we have to use convention based routing.
  2. At no point are the results loaded into a collection of any kind. We retrieve an enumerator and return one result at a time until there are no more results.

JSON Stream

Using this mechanism is enough to return the response in a chunked manner and start streaming the response as soon as there’s a single result ready to return. But there’s still one more piece to the puzzle for our service. Depending what language the calling client is written in, it can either be straight forward to consume the JSON response as we have here or it can be easier to consume what’s become known as a JSON stream. For a DotNet consumer, sending our stream as a comma delimited array is sufficient. If we’re expecting calls from a Ruby client then we should definitely consider converting our response to a JSON stream.

For our customers response, we might send a response which looks like this (but hopefully much bigger):

{"id":123,"name":"Fred Bloggs","birthYear":1977}
{"id":133,"name":"Frank Bruno","birthYear":1961}
{"id":218,"name":"Ann Frank","birthYear":1929}

This response is in a format called Line-Delimited JSON (LDJSON). There’s no opening square bracket to say this is a collection, because it isn’t a collection. This is a stream of individual records which can be processed without having to wait for the entire response to be evaluated. Which makes a lot of sense; just as we don’t want to have to load the entire response on the server, we also don’t want to load the entire response on the client.

A chunked response is something that most HTTP client packages will handle transparently. Unless the client application is coded specifically to receive each parsed object in each chunk, then there’s no difference on the client side to receiving an unchunked response. LDJSON breaks this flexibility, because the response is not valid JSON – one client will consume it easily, but another would struggle. At the time of writing, DotNet wants only standard JSON whereas it’s probably easier in Ruby to process LDJSON. That’s not to say it’s impossible to consume the collection in Ruby or LDJSON in DotNet, it just requires a little more effort for no real reward. To allow different clients to still consume the endpoint, we can add a MediaTypeFormatter specifically for the ‘application/x-json-stream’ media type (this isn’t an official media type, but it has become widely used). So any consumer can either expect JSON or an LDJSON stream.

public class JsonStreamFormatter : MediaTypeFormatter
    public JsonStreamFormatter()
        SupportedMediaTypes.Add(new MediaTypeHeaderValue("application/x-json-stream"));

    public override async Task WriteToStreamAsync(Type type, object value, Stream writeStream, HttpContent content,
        TransportContext transportContext)
        using (var writer = new StreamWriter(writeStream))
            var response = value as IEnumerable;
            if (response == null)
                throw new NotSupportedException($"Cannot format for type {type.FullName}.");
            foreach (var item in response)
                string itemString = JsonConvert.SerializeObject(item, Formatting.None);
                await writer.WriteLineAsync(itemString);

    public override bool CanReadType(Type type)
        return false;

    public override bool CanWriteType(Type type)
        Type enumerableType = typeof(IEnumerable);
        return enumerableType.IsAssignableFrom(type);

This formatter only works for types implementing IEnumerable, and uses Newtonsoft’s JsonConvert object to serialise each object in turn before pushing it into the response stream. Enable the formatter by adding it to the Formatters collection:

config.Formatters.Add(new JsonStreamFormatter());

DotNet Consumer

Let’s take a look at a DotNet consumer coded to expect a normal JSON array of objects delivered in a chunked response.

public class Client
    public IEnumerator Results()
        var serializer = new JsonSerializer();
        using (var httpClient = new HttpClient())
            httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
            using (var stream = httpClient.GetStreamAsync("").Result)
                using (var jReader = new JsonTextReader(new StreamReader(stream)))
                    while (jReader.Read())
                        if (jReader.TokenType != JsonToken.StartArray && jReader.TokenType != JsonToken.EndArray)
                            yield return serializer.Deserialize(jReader);

Here we’re using the HttpClient class and we’re requesting a response as “application/json”. Instead of using a string version of the content, we’re working with a stream. The really cool part is that we don’t have to do much more than just throw that stream into a JsonTextReader (part of NewtonSoft.Json). We can yield the response of each JSON token as long as we ignore the first and last tokens, which are the open and closing square brackets of the JSON array. Calling jReader.Read() at the next level reads the whole content of that token, which is one full object in the JSON response.

This method allows each object to be returned for processing while the stream is still being received. The client can save on memory usage just as well as the service.

Ruby Consumer

I have only a year or so experience with Ruby on Rails, but I’ve found it to be an incredibly quick way to build services. In my opinion, there’s a trade off when considering speed of development vs complexity – because the language is dynamic the developer has to write more tests which can add quite an overhead as a service grows.

To consume our service from a Ruby client, we might write some code such as this:

def fetch_customers(birth_year)
  uri  = ""
  opts = {
  query: { birthYear: birth_year },
    headers: {'Accept' => 'application/x-json-stream'},
    stream_body: true,
    timeout: 20000

  parser.on_parse_complete = lambda do |customer|
      yield customer.deep_transform_keys(&:underscore).with_indifferent_access

  HTTParty.get(uri, opts) do |chunk|
     parser << chunk

def parser
  @parser ||=

Here we're using HTTParty to manage the request, passing it ‘stream_body: true’ in the options and ‘Accept’ => ‘application/x-json-stream’ set. This tells HTTParty that the response will be chunked, the header tells our service to respond with LDJSON.

From the HTTParty.get block, we see that each chunk is being passed to a JSON parser Yajl::Parser, which understands LDJSON. Each chunk may contain a full JSON object, or several, or partial objects. The parser will recognise when it has enough JSON for a full object to be deserialized and it will send it to the method assigned to parser.on_parse_complete where we’re simply returning the object as a hash with indifferent access.

The Result

Returning responses in a chunked fashion is more than just a neat trick, the amount of memory used by a service returning data in this fashion compared to loading the entire result set into memory before responding is tiny. This means more consumers can request these large result sets and other processes on the server are not impacted.

From my own experience, the Yajl library seems to become unstable after handling a response which streams for more than half an hour or so. I haven’t seen anyone else having the same issue, but on one project I’ve ended up removing Yajl and just receiving the entire response with HTTParty and parsing the collection fully in memory. It isn’t ideal, but it works. It also doesn’t stop the service from streaming the response, it’s just the client that waits for the response to load completely before parsing it.

It’s a nice strategy to understand and is useful in the right place, but in an upcoming post I’ll be explaining why I think it’s often better to avoid large JSON responses altogether and giving my preferred strategy and reasoning.

3 thoughts on “Large JSON Responses

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s