With the rise of microservice architectures, an increasing amount of what we do is weaving parts of Amazon Web Services (AWS) into distributed applications.

To manage these components, we use a large amount of automation. To use AWS programmatically, you need keypairs, which are associated with user accounts. In this article, we’ll take a look at our approach to managing these accounts.

We’ll talk about AWS here, but the principles in this article cover can be applied more broadly.

Common mistakes - overpowered users

Many organisations simply have one AWS user with administrative privileges and generate a keypair for this user, which they use for everything. Similar to the "God Object" anti-pattern, this is lazy and very dangerous for three reasons.

Firstly, if the API keys for this machine user are compromised, an attacker can do anything in the application.

Secondly, if an organisation is using a key/value pair for a "real" user, that user could end up locked out of their own account!

Thirdly, even if there is no compromise, the fact that automation has more privileges than it needs means that it can misbehave - for example, it might terminate all nodes when you only meant to terminate a certain group of nodes, or it might drop a database.

Better practise - machine users

The essence of security is the principle of least privilege - only give the minimal permissions necessary. It follows that our objectives in setting up our automation users are:

  1. Limit what an attacker can do if a keypair is compromised
  2. Use a keypair that is not tied to a real human being
  3. Limit what an application can do if it misbehaves

To do this, we configure machine users. Machine users are user accounts that don’t have a login password — they can’t use the AWS console. They’re not people, they’re simply accounts that are created to fulfil specific purposes. When faced with an automation task, we create a specific AWS user to do that task and only that task — we never give it administrative privileges, or access to services it doesn’t need.

We generally set up multiple machine users for a project: one might have the responsibility of provisioning EC2 nodes, whereas another would be responsible for managing a static S3 bucket where the application uploads files. After all, provisioning servers in DevOps land and end users uploading files in userland are very different concerns, so it makes sense for separate machine users to be assigned these privileges.

This approach meets our objectives:

  1. If a key is captured, an attacker can only do what that machine user can do.
  2. If a key is captured, no human’s account is compromised. We have a clean separation of responsibilities between humans and automatons.
  3. If the application misbehaves, it is doing so in a "sandbox" and the catastrophe will be contained.

There are further advantages:

  1. The machine user can be invalidated in one fell swoop.
  2. In logs, we can clearly see which machine users are doing what, rather than having some generic user doing everything. This helps us to identify misbehaviour.

Essentially, we have restricted each machine user on 2 axes:

  1. What the machine user can do
  2. Where the machine user can do it

Similar to the DevOps principle of "treat servers like cattle, not like pets", machine users should be treated as disposable. They are not precious things, so don’t give them a funky name that might cause you to get attached to them!

Wrapping up

Do you use cloud services? Are you using machine users? A good place to start is the AWS documentation on IAM (Identity and Access Management) roles.