A little background
At our work (currently), we needed to be able to call to AWS on the CLI. We try to use AzureAD as our main IDP (rather than our legacy ADFS deployment).
This works great for accessing the AWS console using the Enterprise App already available on the store, but it doesn't work at all for getting API creds for use with tools like the AWS CLI.
Our AWS account team pointed us to a neat tool, the AWS CLI Credentials Provider. Which is a sample for how one could integrate other providers and SAML.
The main reason why you'd want to use something like AzureAD is to ensure things like MFA is applied consistently and conditional access policies still apply, without us needing to add that to AWS.
The secret sauce that makes this work is in the V1 on-behalf-of
flow
available in AzureAD. It's documented
well, but it's a long read. The most interesting section is the
Service-to-service access token request because that's what allows us
to hand in an OAuth token for one service, and request a SAML token for
another.
Update
Since deploying this internally, we ended out deploying AWS Directory Services, so AWS SSO now makes sense. I've not decided yet if that should kill this solution.
Also, knowing you can "switch" between token types in AAD is neat.
The architecture required
We're going to need a few components here to make this work:
- An Azure AD Tenant
- The
Amazon Web Services (AWS)
Enterprise Application deployed to that tenant - An Application Registration for the CLI component - to identify our user
- An Application Registration for the Middleware component - to transform the OAuth token into a SAML token, using the on-behalf-of flow
- Code that implements our credentials provider app
A "rough" sequence diagram of the events that we need to happen is below:
Setting up the App Registrations
There's a lot to do there, and the instructions are better documented here: AzureAD Setup Instructions
The gist of it though, is you need to set your two new applications up as
Public client applications. The CLI app will need to be able to claim
ID Tokens
and the Middleware app will need both ID Tokens
and
Access Tokens
. Turning on public clients allows you to use the device code
flow amongst other things.
The CLI Application doesn't need any special settings, really. Users are only going to use it for the first authentication hop, to get a token.
The Middleware Application on the other hand will need to:
- Create a scope allowing
user_impersonation
- Be delegated permissions by the user (or admins) to call upstream to the
AWS app with
user_impersonation
- Allow the CLI app to call it, with the afformentioned impersonation scope.
The CLI App
I've written these components in python, although ADAL (Microsoft's auth
library adal
isn't that hot IMHO). This is probably easier in MSAL
, the
newer version. That being said at the time I looked at this, MSAL couldn't
do device code flows.
The code here are rough snippets. I originally wrote them for python3.6
and haven't tested them since. I've got more production ready code linked in
at the bottom of this article.
The following will connect to Azure AD and ask for a new device code. With this, we can then log in using a browser on any device, which will authenticate this session.
import adal
# Create an ADAL Auth Context against my tenant, ask for a device code
context = adal.AuthenticationContext(authority_url)
device_code = context.acquire_user_code(RESOURCE, client_id)
print("")
print("Logging in user using device token...")
print(device_code['message'])
From there, we can then claim the device code for a token once the user signs in on the browser. Simple right?
token = context.aquire_token_with_device_code(RESOURCE, device_code, client_id)
Yeah, no. It's not that easy. What happens if the token is never claimed? How do we deal with being able to cancel this blocking request?
We probably want it to be more sensible and secure. As such we also now need to
poll in the background while we wait. This almost certainly isn't the best way
to do this, but I'll use ThreadPoolExecutor
with a single thread. If you have
a more sane way to do this, please do let me know. The TPE is going to take our
function context.aquire_token_with_device_code
and all the arguments to be
passed into that function to get a real OAuth token back from AzureAD.
If we don't get a result (i.e. timeout or whatever), we'll specifically tell
Azure AD to bin that device_code
session, so it can't be stolen.
This seems easy on the face of it, but the code ends out being way more complex.
# Block the thread, until we get confirmation that the token has
# been claimed or we time out
tpe = ThreadPoolExecutor(max_workers=1)
futures = []
# Add our poll job promise to the queue
futures.append(tpe.submit(
context.acquire_token_with_device_code, RESOURCE, device_code, client_id
))
# Store the result if we get one inside 10 seconds, otherwise bail
# Blocking starts here...
result = concurrent.futures.wait(futures, timeout=10)
if len(result.done) == 1:
token = result.done[0].result()
else:
context.cancel_request_to_get_token_with_device_code(device_code)
raise Exception("Device token flow has timed out")
OK, now we are authenticated. Yey. 🌈
Using the OBO flow to swap to SAML
At this point, our CLI app has a token
that contains a perfectly valid OAuth
token, that's scoped to access our middleware app.
We'll draw up a request to fire of at the Azure AD like the sample below:
# We'll do this in requests because ADAL won't
oauth_url = f"{authority_url}/oauth2/token"
obo_payload = {
"grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
"assertion": token["accessToken"],
"client_id": middleware_client_id,
"client_secret": middleware_client_secret,
"resource": appid_of_aws_application,
"requested_token_use": "on_behalf_of",
"requested_token_type": "urn:ietf:params:oauth:token-type:saml2"
}
response = requests.post(oauth_url, data=obo_payload)
r = resp.json()
# Pull the saml token out of the response.
saml_token = base64.urlsafe_b64decode(r['access_token'])
if type(saml_token) == bytes:
saml_token = saml_token.decode('utf-8')
This is kind of neat, and there's a few things happening here.
- We take our existing OAuth token issued to our CLI app, in the name of our user
- We use the middleware's
client_id
andclient_secret
to allow us to use theon_behalf_of
grant - We tell Azure AD the resource we want to access is the AWS application
- We ask for the token to be exchanged for a saml assertion
Err, wait. Isn't that needlessly complex? I'm afraid not. What we are doing here is service to service. We can't call the Enterprise App directly from the CLI app because we don't own the Enterprise app and we can't make it support that kind of flow.
Getting deeper...
If all you wanted to know was how to get from one token to the other, you should stop reading now. I'm continuing down the rabbit hole for this with some AWS side, and nice user bits for the script, because I can.
Taking the token to AWS
OK - so now we have a SAML Assertion
. That's important to note because an
assertion alone isn't actually useful for use. We need to turn that into a
request AWS understands.
To do that we'll need to parse out a bunch of information.
# Use ElementTree to parse our token, because it's XML
root = ET.from_string(saml_token)
# Read the assertion, and parse out all the claims for AWS roles
aws_roles = []
SAML_NS = '{urn:oasis:names:tc:SAML:2.0:assertion}'
for attr in root.iter(f"{SAML_NS}Attribute"):
if (attr.get('Name') == 'https://aws.amazon.com/SAML/Attributes/Role'):
for val in attr.iter(f"{SAML_NS}AttributeValue"):
available_roles.append(val.text)
# We also need to get the issuer for later
saml2_issuer = root.find(f"{SAML_NS}Issuer")
# Format a valid SAML response (as if it came from something like ADFS)
saml_response_tpl = """
<samlp:Response ID="_{response_id}"
Version="2.0" IssueInstant="{authn_instant}"
Destination="https://signin.aws.amazon.com/saml"
xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
<Issuer xmlns="urn:oasis:names:tc:SAML:2.0:assertion">{issuer}</Issuer>
<samlp:Status>
<samlp:StatusCode Value="urn:oasis:names:tc:SAML:2.0:status:Success"/>
</Status>
{saml_assertion}
</samlp:Response>
"""
saml_response = saml_response_tpl.format(
response_id=uuid.uuid4(),
authn_instant=datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S.%f.Z"),
issuer=saml2_issuer.text,
saml_assertion=saml_token
)
At this point we could fire that at the AWS SAML endpoint and Bob's your aunty. If this was the console, it'd pop up with the role selection dialog. We're in a CLI, so we need to do that ourselves...
# This is a helper because many, many blogs get the SAML ARN
# claim in the wrong order. For the record, it's IDP_ARN,ROLE_ARN
for role in aws_roles:
chunks = role.split(',')
if 'saml-provider' in chunks[0]:
new_aws_role = f"{chunks[1]},{chunks[0]}
index = aws_roles.index(role)
aws_roles.insert(index, new_aws_role)
aws_role.remove(role)
# Allow the user to pick the role they want, or if there's
# only one use that one
if len(aws_roles) > 1:
i = 0
print("Please choose the role you would like to use:")
for role in aws_roles:
role_arn = role.split(',')[0]
print("[{i}]: {role_arn}")
i+=1
print("Selection: ")
selected_role_index = input()
if int(selected_role_index) > (len(aws_roles) - 1):
print("You selected an invalid role number, please try again...")
sys.exit(1)
role_arn, principal_arn = aws_roles[int(selected_role_index)].split(',')
else:
role_arn, principal_arn = aws_roles[0].split(',')
# Finally, call STS and get some creds
conn = boto3.sts.connect_to_region('us-east-1')
token = conn.assume_role_with_saml(
role_arn, principal_arn, base64.b64encode(saml_response.encode())
)
print("STS Token:")
print("Expiration: {}".format(token.credentials.expiration))
print("Access Key: {}".format(token.credentials.access_key))
print("Secret Access Key: {}".format(token.credentials.secret_key))
print("Session Token: {}".format(token.credentials.session_token))
What a saga.
Again Thanks to the AWS Team in Perth and Sydney - they pointed me in the right direction on this on the AWS side.
The github repo I've created for this (with slightly more sensible code for production) can be found at github.com/elliotsegler/aws-aad-creds.