I Trust AWS IAM to Secure My Applications. I Don’t Trust the IAM Docs to Tell Me How.

AWS IAM operates at an immense scale, more than 400 million operations per second, and the stakes are frankly terrifying; a substantial portion of the internet runs on AWS, and access to those resources is regulated by IAM.

I’m therefore glad that the people who design and run IAM are so smart that it’s also frankly terrifying. The depth of expertise and wisdom backing up IAM is such that I am confident that I’m being provided the tools to secure my AWS resources, and that those tools will work as designed.

But to secure my AWS resources, I need to know how to use those tools, not to mention what tools there are in the first place. And here IAM falls woefully short. The IAM documentation is too often incomplete, confusing, or out-of-date. It’s rarely wrong, per se, but it does not live up to the high standards that are required of security documentation, and worse, is not on a trajectory that shows it getting and staying substantially better. Customers are left to piece together the how of IAM from pages scattered across central IAM docs and those hidden deep within service docs, from examples, confusing terminology, and incomplete specifications.

We need documentation of AWS IAM — across IAM itself and all AWS services — to be both more accessible and to go further into comprehensive depth, so that we can confidently and successfully use the extensive power of IAM to gain the security we all want.

I was prompted to write this article because of a specific issue that relates the policy evaluation documentation not sufficiently distinguishing between IAM roles and IAM role sessions as relates to permissions boundaries and resource policy evaluation. It’s not important to explain it here; the IAM team are now aware of it and have assured me they are thinking through how that can be better presented.

I could compile a large catalog of IAM documentation deficiencies, but I’m not Martin Luther and the purpose of this article is not to nail 95 theses to IAM’s door. If I did, some of the problems would be within the control of the IAM team, like the iam:AssociatedResourceArn for iam:PassRole going undocumented. Others would be outside of their control, because the IAM section of every service’s documentation is the responsibility of that service team, not the central IAM team. So when API Gateway took over a year to document that their resource policies supported tag conditions, that’s on the API Gateway team. But as AWS customers, we shouldn’t have to care about that distinction. To us, it’s all one Identity and Access Management system, and the documentation for all parts of it should be held to the same high standard.

Any particular documentation issue can be addressed, but the issue here is with the foundation: these sorts of problems are ubiquitous and pervasive.

What frustrates me most is that IAM is an extremely well-understood system, internally at AWS. Its mechanics are so well-specified that they can build automated reasoning systems to prove theorems about it, which backs features like IAM Access Analyzer. They literally consider it a science. But this precision does not extend to the documentation.

The IAM documentation is trying to balance being accessible while also having comprehensive depth, and I appreciate that is a difficult balance to strike, but we’re nowhere near close to the correct balance. The docs have three separate Venn diagrams to show the interaction of permissions boundaries with each of resource policies, SCPs, and session policies, but don’t include how resource policies, SCPs, and session policies interact with each other. The docs use “federated user” to mean two completely separate things: a role session assumed using a federated identity provider, or a principal created using an IAM user through the GetFederationToken API (apparently a legacy system from before IAM roles). There is even an IAM docs page that uses both of these meanings on the same page, without mentioning the distinction.

In addition to being the wrong balance of accessible and in-depth (hint: the answer is to be both, separately), the docs are also incomplete. It’s left as an exercise for the community to document the list of service principals. The page of global context keys should be for all keys that can be used regardless of the action being performed (so, not s3:object-lock-mode ), but only documents the keys with the aws: prefix, leaving out global keys like identitystore:UserId, which can also be used for any action; it’s not present for all principals, but neither is aws:username, which isn’t present on IAM roles. There’s a thrown-in section at the end that basically says “other keys may exist” and gives some examples but abdicates from being a comprehensive list.

Amazon has 16 Leadership Principles. AWS’s IAM documentation falls short on a lot of them:

Are Right, A Lot: Too often the documentation is incomplete or confusing in a way that can cause customers to come away with an incorrect understanding.

Insist on the Highest Standards: Can anyone really say with a straight face this the level of quality we should expect for IAM documentation?

Ownership: The IAM team doesn’t own the IAM documentation within each AWS service, and the service teams don’t consider their IAM documentation to be a priority.

Deliver Results: IAM is 10 years old. That’s more than enough time to become more mature than what we have today.

Earn Trust: Most fundamentally, AWS IAM documentation is poor enough that, as a user, I can never take it at its word. I can’t use it as an authoritative reference when someone asks me a question. If I want to be confident in the answer I give them, I have to read the docs, and then build a test suite to check whether the docs are telling me the whole story. I wouldn’t even say it rises to the bar of “trust but verify”.

Which brings me to the path forward: Think Big. On the implementation side, AWS IAM scales to the massive footprint of AWS services and immense rate of change that comes with the pace of feature and service releases. On the documentation side, that’s filtered through a small set of humans who are tasked with producing docs that satisfy both “the docs are up-to-date” and “the docs are complete and comprehensible”, but they have no chance at accomplishing either, given the firehose of updates. This has been leaving customers in the lurch for years. IAM needs to figure out how to attach a pipeline to the programmatic information they already have to deliver it into customer documentation, to scale it out to the size and pace of AWS today. I don’t pretend this will be easy; for example, the much-needed central list of service principals needs to not show a new service’s principal until the moment the service is announced, and after launch, every minute of delay is a detriment to customers. The IAM documentation for individual services needs to get prioritized, and be accountable to the IAM team for its correctness and comprehensiveness.

All of this will be a huge undertaking. But I don’t question whether AWS can accomplish it. The IAM system itself stands as a testament to what AWS can achieve; we (rightly) put the security of our cloud resources in the hands of AWS IAM and we should believe AWS can give us documentation worthy of that trust.