Keystone Federated Swift – Multi-region cluster, multiple federation, access same account

Welcome to the final post in the series, it has been a long time coming. If required/requested I’m happy to delve into any of these topics deeper, but I’ll attempt to explain the situation, the best approach to take and how I got a POC working, which I am calling the brittle method. It definitely isn’t the best approach but as it was solely done on the Swift side and as I am a OpenStack Swift dev it was the quickest and easiest for me when preparing for the presentation.

To first understand how we can build a federated environment where we have access to our account no matter where we go, we need to learn about how keystone authentication works from a Swift perspective. Then we can look at how we can solve the problem.

Swift’s Keystoneauth middleware

As mentioned in earlier posts, there isn’t any magic in the way Swift authentication works. Swift is an end-to-end storage solution and so authentication is handled via authentication middlewares. further a single Swift cluster can talk to multiple auth backends, which is where the `reseller_prefix` comes into play. This was the first approach I blogged about in these series.

 

There is nothing magical about how authentication works, keystoneauth has it’s own idiosyncrasies, but in general it simply makes a decision whether this request should be allowed. It makes writing your own simple, and maybe an easily way around the problem. Ie. write an auth middleware to auth directly to your existing company LDAP server or authentication system.

 

To setup keystone authentication, you use keystones authtoken middleware and directly afterwards in the pipeline place the Swift keystone middleware, configuring each of them in the proxy configuration:

pipeline = ... authtoken keystoneauth ... proxy-server

The authtoken middleware

Generally every request to Swift will include a token, unless it’s using tempurl, container-sync or to a container that has global read enabled but you get the point.

As the swift-proxy is a python wsgi app the request first hits the first middleware in the pipeline (left most) and works it’s way to the right. When it hits the authtoken middleware the token in the request will be sent to keystone to be authenticated.

The resulting metadata, ie the user, storage_url, groups, roles etc, and dumped into the request environment and then passed to the next middleware. The keystoneauth middleware.

The keystoneauth middleware

The keystoneauth middleware checks the request environment for the metadata dumped by the authtoken middleware and makes a decision based on that. Things like:

  • If the token was one for one of the reseller_admin roles, then they have access.
  • If the user isn’t a swift user of the account/project the request is for, is there an ACL that will allow it.
  • If the user has a role that identifies them as a swift user/operator of this Swift account then great.

 

When checking to see if the user has access to the given account (Swift account) it needs to know what account the request is for. This is easily determined as it’s defined by the path of the URL your hitting. The URL you send to the Swift proxy is what we call the storage url. And is in the form of:

http(s)://<url of proxy or proxy vip>/v1/<account>/<container>/<object>

The container and object elements are optional as it depends on what your trying to do in Swift. When the keystoneauth middleware is authenticating it’ll check that the project_id (or tenant_id) metadata dumped by authtoken, when this is concatenated with the reseller_prefix, matches the account in the given storage_url. For example let’s say the following metadata was dumped by authtoken:

{
"X_PROJECT_ID": 'abcdefg12345678',
"X_ROLES": "swiftoperator",
...
}

And the reseller_prefix for keystone auth was AUTH_ and we make any member of the swiftoperator role (in keystone) a swift operator (a swift user on the account). Then keystoneauth would allow access if the account in the storage URL matched AUTH_abcdefg12345678.

 

When you authenticate to keystone the object storage endpoint will point not only to the Swift endpoint (the swift proxy or swift proxy load balancer), but it will also include your account. Based on your project_id. More on this soon.

 

Does that make sense? So simply put to use keystoneauth in a multi federated environment, we just need to make sure no matter which keystone we end up using and asking for the swift endpoint always returns the same Swift account name.

And there lies our problem, the keystone object storage endpoint and the metadata authtoken dumps uses the project_id/tenant_id. This isn’t something that is synced or can be passed via federation metadata.

NOTE: This also means that you’d need to use the same reseller_prefix on all keystones in every federated environment. Otherwise the accounts wont match.

 

Keystone Endpoint and Federation Side

When you add an object storage endpoint in keystone, for swift, the url looks something like:

http://swiftproxy:8080/v1/AUTH_$(tenant_id)s

 

Notice the $(tenant_id)s at the end? This is a placeholder that keystone internally will replace with the tenant_id of the project you authenticated as. $(project_id)s can also be used and maps to the same thing. And this is our problem.

When setting up federation between keystones (assuming keystone 2 keystone federation) you generate a mapping. This mapping can include the project name, but not the project_id. Theses ids are auto-generated, not deterministic by name, so creating the same project on different federated keystone servers will have different project_id‘s. When a keystone service provider (SP) federates with a keystone identity provider (IdP) the mapping they share shows how the provider should map federated users locally. This includes creating a shadow project if a project doesn’t already exist for the federated user to be part of.

Because there is no way to sync project_id’s in the mapping the SP will create the project which will have a unique project_id. Meaning when the federated user has authenticated their Swift storage endpoint from keystone will be different, in essence as far as Swift is concerned they will have access but to a completely different Swift account. Let’s use an example, let’s say there is a project on the IdP called ProjectA.

           project_name        project_id
  IdP      ProjectA            75294565521b4d4e8dc7ce77a25fa14b
  SP       ProjectA            cb0d5805d72a4f2a89ff260b15629799

Here we have a ProjectA on both IdP and SP. The one on the SP would be considered a shadow project to map the federated user too. However the project_id’s are both different, because they are uniquely  generated when the project is created on each keystone environment. Taking the Object Storage endpoint in keystone as our example before we get:

 

          Object Storage Endpoint
  IdP     http://swiftproxy:8080/v1/AUTH_75294565521b4d4e8dc7ce77a25fa14b
  SP      http://swiftproxy:8080/v1/AUTH_cb0d5805d72a4f2a89ff260b15629799

So when talking to Swift you’ll be accessing different accounts, AUTH_75294565521b4d4e8dc7ce77a25fa14b and AUTH_cb0d5805d72a4f2a89ff260b15629799 respectively. This means objects you write in one federated environment will be placed in a completely different account so you wont be able access them from elsewhere.

 

Interesting ways to approach the problem

Like I stated earlier the solution would simply be to always be able to return the same storage URL no matter which federated environment you authenticate to. But how?

  1. Make sure the same project_id/tenant_id is used for _every_ project with the same name, or at least the same name in the domains that federation mapping maps too. This means direct DB hacking, so not a good solution, we should solve this in code, not make OPs go hack databases.
  2. Have a unique id for projects/tenants that can be synced in federation mapping, also make this available in the keystone endpoint template mapping, so there is a consistent Swift account to use. Hey we already have project_id which meets all the criteria except mapping, so that would be easiest and best.
  3. Use something that _can_ be synced in a federation mapping. Like domain and project name. Except these don’t map to endpoint template mappings. But with a bit of hacking that should be fine.

Of the above approaches, 2 would be the best. 3 is good except if you pick something mutable like the project name, if you ever change it, you’d now authenticate to a completely different swift account. Meaning you’d have just lost access to all your old objects! And you may find yourself with grumpy Swift Ops who now need to do a potentially large data migration or you’d be forced to never change your project name.

Option 2 being unique, though it doesn’t look like a very memorable name if your using the project id, wont change. Maybe you could offer people a more memorable immutable project property to use. But to keep the change simple being able simply sync the project_id should get us everything we need.

 

When I was playing with this, it was for a presentation so had a time limit, a very strict one, so being a Swift developer and knowing the Swift code base I hacked together a varient on option 3 that didn’t involve hacking keystone at all. Why, because I needed a POC and didn’t want to spend most my time figuring out the inner workings of Keystone, when I could just do a few hacks to have a complete Swift only version. And it worked. Though I wouldn’t recommend it. Option 3 is very brittle.

 

The brittle method – Swift only side – Option 3b

Because I didn’t have time to simply hack keystone, I took a different approach. The basic idea was to let authtoken authenticate and then finish building the storage URL on the swift side using the meta-data authtoken dumps into wsgi request env. Thereby modifying the way keystoneauth authenticates slightly.

Step 1 – Give the keystoneauth middleware the ability to complete the storage url

By default we assume the incoming request will point to a complete account, meaning the object storage endpoint in keystone will end with something like:

'<uri>/v1/AUTH_%(tenant_id)s'

So let’s enhance keystoneauth to have the ability to if given only the reseller_prefix to complete the account. So I added a use_dynamic_reseller option.

If you enable use_dynamic_reseller then the keystoneauth middleware will pull the project_id from authtoken‘s meta-data dumped in the wsgi environment. This allows a simplified keystone endpoint in the form:

'<uri>/v1/AUTH_'

This shortcut makes configuration easier, but can only be reliably used when on your own account and providing a token. API elements like tempurl  and public containers need the full account in the path.

This still used project_id so doesn’t solve our problem, but meant I could get rid of the $(tenant_id)s from the endpoints. Here is the commit in my github fork.

Step 2 – Extend the dynamic reseller to include completing storage url with names

Next, we extend the keystoneauth middleware a little bit more. Give it another option, use_dynamic_reseller_name, to complete the account with either project_name or domain_name and project_name but only if your using keystone authentication version 3.

If you are, and want to have an account based of the name of the project, then you can enable use_dynamic_reseller_name in conjuction with use_dynamic_reseller to do so. The form used for the account would be:

<reseller_prefix><project_domain_name>_<project_name>

So using our example previously with a reseller_preix of AUTH_, a project_domain_name of Domain and our project name of ProjectA, this would generate an account:

AUTH_Domain_ProjectA

This patch is also in my github fork.

Does this work, yes! But as I’ve already mentioned in the last section, this is _very_ brittle. But this also makes it confusing to know when you need to provide only the reseller_prefix or your full account name. It would be so much easier to just extend keystone to sync and create shadow projects with the same project_id. Then everything would just work without hacking.