Never put AWS temporary credentials in the credentials file (or env vars)—there’s a better way

We need to talk about how AWS credential configuration works. Many people have more than one IAM principal that they use on a regular basis, most likely because of multiple accounts, though they may also have multiple principals available to them within a given account. The ways I see a lot of people approach getting credentials for these principals and switching between principals are, in my opinion, inefficient and not making full use of the options provided by the AWS CLI and AWS SDKs. Or, as I put in a recent tweet:

Before you ask: no, stuffing temporary credentials into environment variables is not better.

The confusion is widespread. Here’s a well-trafficked GitHub issue on the CLI stating credentials stored by the CLI when AWS SSO do not conform to “AWS standards”, which is purportedly that the only location for credentials is ~/.aws/credentials. Other issues ask for setting environment variables, which isn’t much better. The actual AWS standard for credential resolution is much richer than this — for example, credential resolution checks for an EC2 instance metadata server, in case the code is running on an EC2 instance that has been given an IAM role.

The fundamental principle here is: code that uses temporary credentials should be able to refresh those credentials. The refreshing should not be required to come from a separate source. Neither environment variables nor ~/.aws/credentials work in a way that code reading credentials from them can refresh them. The good news is, there are better alternatives that do enable refreshable credentials. So let’s dive in and learn how refreshable temporary credentials work in the AWS CLI and SDKs.

Principals and credentials

First, to act as an IAM principal, we need AWS IAM credentials (the kind used for SigV4 signing). These credentials consist of an access key ID, secret access key, and — in the case of temporary credentials — a session token. Temporary credentials also have an associated expiration timestamp. Only the AWS account root and IAM Users have non-temporary credentials — they receive long-term credentials that do not have an expiration (though they should be rotated periodically!). IAM Roles, assumed through STS.AssumeRole, STS.AssumeRoleWithSAML, or STS.AssumeRoleWithWebIdentity, only ever have temporary credentials. IAM Users and the account root can also get temporary credentials for themselves using STS.GetSessionToken (there’s also the legacy STS.GetFederationToken, but we’ll ignore that for simplicity — but it also returns temporary credentials).

On the command line, the AWS CLI, a program using an AWS SDK, or a program capable of SigV4 signing requests on its own, must be be given these credentials in order to successfully call AWS APIs — note that the code using the credentials does not need to know what principal they represent, though they can introspect using STS.GetCallerIdentity (a very useful API call to be familiar with — in addition to the helpful information it returns, it requires no permissions, so will always work with any valid credentials).

So we need a way of providing the code credentials, and because we’re using multiple principals, we need a way of providing it different credentials at different times — including in the same terminal.

In the examples below, we’ll use the aws sts get-caller-identity command as a stand-in for any use of AWS APIs (or more precisely, the ones that require SigV4 signing, which is almost all of them).

One popular way of providing credentials is using environment variables. For an IAM User, that might look like this:

$ export AWS_ACCESS_KEY_ID=<benk key id>
$ export AWS_SECRET_ACCESS_KEY=<benk secret key>
$ aws sts get-caller-identity
{
"UserId": "AIDA...",
"Account": "111122223333",
"Arn": "arn:aws:iam::111122223333:user/benk"
}

Assuming a role and using the credentials

Now, to interact in a different account, it’s common for the IAM User to have AssumeRole permissions. So the direct approach to getting credentials for that role, staying with environment variables, looks like this:

$ export AWS_ACCESS_KEY_ID=<benk key id>
$ export AWS_SECRET_ACCESS_KEY=<benk secret key>
$ aws sts assume-role --role-arn arn:aws:iam::777788889999:role/MyRole --role-session-name ben
{
"Credentials": {
"AccessKeyId": "<MyRole key id>",
"SecretAccessKey": "<MyRole secret key>",
"SessionToken": "<MyRole:ben session token>",
"Expiration": "2021-10-06T20:12:56Z"
},
...
}

Now, to act as that user, we have to export those credentials:

$ export AWS_ACCESS_KEY_ID=<MyRole key id>
$ export AWS_SECRET_ACCESS_KEY=<MyRole secret key>
$ export AWS_SESSION_TOKEN=<MyRole:ben session token>
$ aws sts get-caller-identity
{
"UserId": "AROA...:ben",
"Account": "777788889999",
"Arn": "arn:aws:sts::777788889999:assumed-role/MyRole/ben"
}

Great — but now if we want to switch back to using the benk IAM User, we have to re-export those credentials all over again. People write all sorts of scripts and utilities to set and change these environment variables, but there’s a much simpler way.

Additionally, these assumed role credentials expire. The default session duration is 3600 seconds (one hour). So the following (contrived) example will fail:

$ export AWS_ACCESS_KEY_ID=<MyRole key id>
$ export AWS_SECRET_ACCESS_KEY=<MyRole secret key>
$ export AWS_SESSION_TOKEN=<MyRole:ben session token>
$ sleep 3700; aws sts get-caller-identity
An error occurred (ExpiredToken) when calling the GetCallerIdentity operation: The security token included in the request is expired

Long-running processes shouldn’t have their credentials expire on them. Nor should you have to manually refresh these credentials when they expire, while you still have valid credentials to call AssumeRole again. And while you can set a longer session duration, you shouldn’t have to, and shorter durations are preferable for security. And again, there’s a much simpler way.

Configuring and using named profiles

What we’re going to do is give names to every configuration we want to use — where a configuration represents, at least, a principal and region. We’ll call these named configurations profiles. So we might have a profile for using the benk IAM User in us-east-1, and a separate profile for benk in us-west-2 — though we could also use the profile with us-east-1 in it and override the region (by setting the AWS_DEFAULT_REGION environment variable).

We’ll call the profile for the benk IAM User, sensibly enough, benk, and for the role, MyRole.

First, let’s put our long-term credentials for the benk IAM User in the ~/.aws/credentials file. In this file, profile section headers are just the profile name.

[benk]
aws_access_key_id = <benk key id>
aws_secret_access_key = <benk secret key>

Now, we’ll put the rest of the profile configuration in ~/.aws/config. We only need the region, but you can put things here like what output format you want the CLI to use (try output = yaml, though you may want to set AWS_DEFAULT_OUTPUT in your .bashrc/.profile so that you don’t have to put it in every profile). In this file, profile section headers have the prefix profile.

[profile benk]
region = us-east-1

Why two files? The short answer is, you want to treat ~/.aws/credentials as a secret, like your SSH keys, but your configuration could be something you share or check into source control. For a detailed explanation of the difference, see this article.

So now we reference that profile name when making AWS calls:

$ aws sts get-caller-identity --profile benk
{
"UserId": "AIDA...",
"Account": "111122223333",
"Arn": "arn:aws:iam::111122223333:user/benk"
}

The --profile argument take precedence over options provided through environment variables.

Ideally, script and application writers also include a --profile option. For example, in Python, your program might look like:

import argparse, boto3parser = argparse.ArgumentParser()
parser.add_argument('--profile')
# add other arguments
args = parser.parse_args()
session = boto3.Session(profile_name=args.profile)# use session
# e.g., session.client('sts').get_caller_identity()

If you use boto3 and you’re not familiar with sessions, here’s an article on what they are and why you should use them.

However, not all programs provide this. So instead, you can set the AWS_PROFILE (or AWS_DEFAULT_PROFILE, there is no difference) argument:

$ export AWS_PROFILE=benk
$ aws sts get-caller-identity
{
"UserId": "AIDA...",
"Account": "111122223333",
"Arn": "arn:aws:iam::111122223333:user/benk"
}

Note that due to a long-standing bug in the CLI and some SDKs, you must un-export these environment variables, not just set them to an empty string, otherwise you get an error, even if you’re using an explicit profile parameter:

$ export AWS_PROFILE=""
$ aws sts get-caller-identity --profile benk
The config profile () could not be found
$ export -n AWS_PROFILE
$ aws sts get-caller-identity --profile benk
{
"UserId": "AIDA...",
"Account": "111122223333",
"Arn": "arn:aws:iam::111122223333:user/benk"
}

I’ve written a shell function to manage this environment variable, including handling the un-export, and providing autocomplete for profile names (requires the AWS CLI v2, which you should upgrade to!)

So, now we want to add a profile for MyRole. Do we need to call aws sts assume-role --profile benk --role-arn ... and then put the resulting credentials in a MyRole profile in ~/.aws/credentials? NO!! Please, don’t do this!

Remember that IAM Role credentials are temporary. So if you put them in environment variables or the ~/.aws/credentials file, at some point you’re going to have to refresh them — either because an API call failed, or by running some background process to periodically refresh them. Wouldn’t it be great if neither of those were necessary?

Assumed role profiles

You can directly configure a profile to assume a role. The AWS CLI and SDKs understand this configuration, and make the AssumeRole call when the credentials are needed, and cache the results, including the expiration, and will call AssumeRole again when needed, without any extra work from you!

To make an assumed-role profile, you use the role_arn parameter. You then have to tell it what other credentials to use to make the AssumeRole call. If it’s another profile, you set source_profile to that profile name (and that could itself be an assumed-role profile, you can chain them!). Otherwise, you provide credential_source set to one of the self-explanatory Environment, Ec2InstanceMetadata, or EcsContainer.

So let’s do that. ~/.aws/credentials looks the same, but ~/.aws/config now looks like this:

[profile benk]
region = us-east-1
[profile MyRole]
role_arn = arn:aws:iam::777788889999:role/MyRole
source_profile = benk
region = us-east-1

Note that unlike when I was using aws sts assume-role, I didn’t provide a role session name. That’s optional for an assumed-role profile; one will be generated if it’s not provided.

So now we can do this:

$ aws sts get-caller-identity --profile benk
{
"UserId": "AIDA...",
"Account": "111122223333",
"Arn": "arn:aws:iam::111122223333:user/benk"
}
$ aws sts get-caller-identity --profile MyRole
{
"UserId": "AROA...:botocore-session-1633547576",
"Account": "777788889999",
"Arn": "arn:aws:sts::777788889999:assumed-role/MyRole/botocore-session-1633547576"
}

If you’re using an SDK, it will also work. In Python:

import time, boto3session = boto3.Session(profile_name='MyRole')
sts = session.client('sts')
# AssumeRole is called for the first time when the credentials are needed, which is here:
arn = sts.get_caller_identity()['Arn']
assert arn.split('/')[1] == 'MyRole'
# if we make another call to the same or any other AWS API, AssumeRole *will not* be called, because the credentials are still valid
session.client('s3').list_buckets()
# if we wait beyond the session duration, which has a default of 3600 seconds (1 hour), AssumeRole *will* get called again automatically because the credentials have expired
time.sleep(3700)
sts.get_caller_identity()

The SDKs cache the credentials in-memory. The AWS CLI doesn’t have memory that persists in between calls, so (in v2 at least) it caches assumed-role credentials in the ~/.aws/cli/cache directory . This cache is not used by the SDKs, and please, please don’t read from it yourself.

If you need programmatic role assumption (that is, configuring an assumed role without the config file), some SDKs support that directly (e.g., JavaScript), but Python does not, so I’ve made the aws-assume-role-lib library for that.

AWS SSO profiles

It also appears some people who use AWS SSO aren’t aware that you can configure profiles for AWS SSO usage as well, and instead copy in credentials from the browser login page. An AWS SSO profile looks like this:

[profile my-sso-profile]
sso_start_url = https://example.awsapps.com/start
sso_region = us-east-1 # the region AWS SSO is configured in
sso_account_id = 123456789012
sso_role_name = MyRoleName
region = us-east-2 # the region to use for AWS API calls

Then, you authenticate using aws sso login --profile my-sso-profile, though note you only need to call aws sso login once for all your profiles (read about that here), and you can use it like any other profile, e.g., aws sts get-caller-identity --profile my-sso-profile.

You can configure these profiles manually, or using aws configure sso, or an even easier option is using aws-sso-util, which has a command aws-sso-util configure populate that will create profiles for all accounts and roles you have access to.

Federated sign-in (AssumeRoleWithSAML) — an obstacle

Great, you say, I can configure an assumed-role profile if there are already source credentials with which to call STS.AssumeRole. And I can use AWS SSO profiles if my organization is using AWS SSO. But what if my organization has federated authentication using STS.AssumeRoleWithSAML? Here’s where we come to the way you can configuration profiles for any possible method of getting credentials!

First, I’ll note that if your organization uses STS.AssumeRoleWithWebIdentity, that can also be configured directly; your web identity token has to be stored in a file and then you use web_identity_token_file instead of source_profile or credential_source.

Second, I’ll note that if your organization migrates to AWS SSO, that SAML federation happens server-side with AWS SSO, and as mentioned above, you can configure profiles to use it directly.

But with plain STS.AssumeRoleWithSAML, there are no easy answers. Most organizations either use an off-the-shelf tool like saml2aws (one example among many) or write their own. These tools authenticate with your identity provider (with some identity-provider-specific code), cache the resulting session (e.g., a browser cookie), and then take the returned SAML assertion to STS.AssumeRoleWithSAML. Again we find ourselves at the situation we had at the beginning: these temporary credentials often get stuffed into environment variables (as in saml2aws script and saml2aws exec) or into the ~/.aws/credentials file (this is the default for saml2aws login).

credential_process — the answer

Enter credential_process. This is a configuration option that tells the CLI/SDKs a command to invoke, which must print credentials to stdout in a specific format:

{
"Version": 1,
"AccessKeyId": "...",
"SecretAccessKey": "...",
"SessionToken": "...",
"Expiration": "<ISO8601 timestamp when the credentials expire>"
}

Expiration is optional; if it is not provided, the credential process will be invoked for every API call.

This means that a process that calls AssumeRoleWithSAML can take its configuration (e.g., user name, role ARN) as command line arguments and output the resulting credentials in this format, and then you can configure a profile like this:

[profile my-saml-role]
credential_process = saml2aws login --role arn:aws:... --credential-process
region = us-east-1

And then using aws sts get-caller-identity --profile my-saml-role will just work, including caching!

Note that there’s nothing special about credential_process that requires the credentials it returns be coming from AssumeRoleWithSAML. Suppose you’re on a Google Compute Engine instance that needs to communicate with AWS, and you’re doing this through long-term credentials creating using an IAM User (in fact, providing credentials to non-AWS machines is the only reason to continue using IAM “Users”). You could stash the credentials on the machine, and try to rotate them appropriately. Or, you could put them in GCP’s Secret Manager, and create a credential process that uses the instance’s GCP credentials to access Secrets Manager to retrieve the AWS credentials, providing an appropriate expiration that will allow the instance to automatically pick up the new credential values when you rotate them in Secrets Manager.

credential_process solves almost every “how do I get credentials in this weird situation?” problem I’ve seen. For example, with the launch of AWS SSO, not all AWS SDKs supported profiles with AWS SSO configuration. So I built a credential process (docs, code) using an SDK that did (boto3), where it reads a profile’s AWS SSO configuration, gets the credentials and spits them back out. Because credential_process is lower in priority than AWS SSO configuration, a profile can be given AWS SSO configuration, which will be picked up by the SDKs that do understand it, and this credential_process pointing to profile it’s in, and the SDKs that don’t understand it will pick that up and still succeed. In fact, this is exactly what the profiles created by aws-sso-util configure populate look like! This setup allows for using the CDK with AWS SSO, because the CDK still uses the JavaScript v2 SDK, when only v3 got AWS SSO support (you can read the thread here and see how many people respond that they are using credential-stuffing workarounds).

Bridging the gap for configuration-oblivious tools

There are situations where credential_process won’t work. The primary one, in particular, is tools that don’t understand the full range of AWS configuration. Maybe they do the signing themselves, and only load credentials from the environment or from the credentials file. Infuriatingly, AWS allows tools to be released that do not do full credential resolution, like the Athena JDBC driver. Please don’t build tools that way yourself, but if you’re forced to use one, you have no choice but to provide explicit credentials (but open a feature request asking for proper support).

The approach I recommend is to set up your configuration properly, and then bridge the gap for tools that fail to correctly understand AWS configuration, rather than always working from the lowest common denominator.

Exactly for that gap-bridging, I’ve built aws-export-credentials, which does exactly one thing: reads configuration in the proper, complete way (because it uses botocore), retrieves credentials with whatever configuration is provided it, and prints out the resulting credentials. So if you need explicit credentials, and you need it for an assumed role, you can set up an assumed role profile as normal and use the --profile argument. It allows you to export credentials to environment variables, to the ~/.aws/credentials file, or to JSON (in the credential_process format, of course).

Another situation is where you’re running a Docker container, and you want AWS credentials from the host injected into it. One option is to use aws-export-credentials to get the credentials as environment variables, but remember, this doesn’t handle refreshing! However, there’s already a way to get refreshable credentials from inside a container: ECS provides this mechanism for when you assign an IAM role to a task. It involves setting up a credential server on the host and then injecting environment variables telling the AWS CLI/SDKs how to talk to that server. aws-export-credentials supports this too, but unfortunately it only works on Linux, due to a combination of Docker networking restrictions and AWS SDK restrictions on hostnames it can contact.

I hope your takeaway from this article is that you almost never need to explicitly set temporary credentials in environment variables or put them into ~/.aws/credentials. Often you can accomplish it directly with profile configuration, and for everything else there’s credential_process, and you can then stuff credentials only if needed using aws-export-credentials. If you’re using a tool for STS.AssumeRoleWithSAML, check if it supports credential_process output, and if not, request it! Don’t be afraid to build your own credential_process tools to overcome credentialing obstacles.

As always, if you’ve got questions or comments, you can hit me up on Twitter.

Cloud Robotics Research Scientist at @iRobot