AWS S3 at Speed

Why a “503: Slow Down” response from Amazon S3 can actually be good for you!

The official AWS S3 docs on Request Rate and Performance Considerations for S3 clearly state, Amazon S3 scales to support very high request rates.
Sometimes this doesn’t appear to be the case in practise given you start pushing your request rates to the bucket and around the 700 requests per second mark you suddenly get hit with these slow down responses.

This is actually a good thing!

What Amazon is doing at this point, is trying to scale out your bucket; like all systems, it helps if they have a bit of breathing room while doing this so you get some slow down responses, but give it about 10 minutes, and now behind the scenes your bucket will have an extra shard.

AWS S3 works like the majority of distributed filesystems, redundancy aside, when more speed is required then the underlying data store is split across multiple servers. What this means is that your keyspace is split, by AWS’ best guess, roughly in half. If all your keys are an even distribution of a-z, then roughly a-m will end up on one shard, and the remaining n-z will be on the second.

This is a good time to remember that AWS is a key-value store, that is, any directories you create don’t actually exist, for example, an aws bucket with the file:

s3://my-test-bucket/a-file.txt

falls into the shard containing all the keys starting with ‘a’, and if that file was in the structure:

s3://my-test-bucket/this/directory/structure.txt

this falls into the shard containing all the keys starting with ‘t’, because the first letter of the key ‘this/directory/structure.txt’ is t.

Common downfalls

Date based directory structures. So often, because you’re optimising for retrieval, you’d create a directory structure in the form:

s3://my-test-bucket/2017/02/08/my-file.txt

Unfortunately this means the sharding for this will always start with the same prefix, ‘2’, given this AWS will try to dig deeper for the key to shard the bucket on, if you’re delivering 1000 files per second into the ‘2017/02/08/’ keyspace, then AWS is clever enough to see that, and shard your bucket in the form:

s3://my-test-bucket/2017/02/08/{a-m} on one shard, and s3://my-test-bucket/2017/02/08/{n-z} on the other shard, again assuming an even keyspace.

Sounds perfect so far, but how about when we roll over to the next day, you suddenly lose all the benefit of the two shards, because ‘2017/02/09/’ will only fall on the second shard. You’re now back to the starting point, and you have to wait for your request rate to trigger another scaling event.

How to overcome this

Prefixing. This makes data retrieval slightly harder, but by no means impossible, adding a prefix to your date based path. Something like:

s3://my-test-bucket/a/2017/02/08/my-file.txt
s3://my-test-bucket/b/2017/02/08/my-next-file.txt
s3://my-test-bucket/c/2017/02/08/my-other-file.txt

What this means for your bucket, is AWS S3 will see the files being spread evenly between these prefixes and use that as the sharding key. At this point with a high enough request rate, Amazon will create three shards, one for data prefixed ‘a’, one for the ‘b’ prefix, and the last for ‘c’.

Now you will be able to achieve something much closer to 2000 requests per second to this bucket.

Retrieval is now a touch harder in that your daily files are spread across three folders, but you’ll also be able to make the 3 GET requests to retrieve them quicker due to the files being on multiple servers behind the scenes.

A Warning

Once you have these shards it’s now easy to get files into your bucket, but how about deletions.

One thing to beware of is the AWS S3 list objects call will return files sorted by name, this means a conventional deletion strategy walking through the list of files and deleting at speed will cause you to be rate limited as you are asking Amazon to delete all the adjacent files on one server, then you’ll get to the next shard and walk through those deleting, then the next etc. A liberal usage of prefixes in the list objects query or a bit of shuffling of the responses depending on the number of files will greatly help here.

S3 also offers a Multi-Object Delete, but this isn’t excluded from these rate limits, if you use it to delete lots of files on one shard then the same rate limits will apply.

In Practise

Using these prefixes, I have an application happily running from a bucket that recently peaked up to 7,000 requests per second, spread across PUTs, GETs, DELETEs, and a handful of list calls. Getting to this point was very linear scaling and the reliability now that the bucket has reached those speeds and been sharded appropriately is great. I have no concerns that there’d be any issues scaling much further.


Also published on Medium.