Entries Tagged 'Uncategorized' ↓

POC WireGuard + FRR Setup a.k.a dodgy meshy test network

It’s hackweek at Suse! Probably one of my favourite times of year, though I think they come up every 9 months or so.

Anyway, this hackweek I’ve been on a WireGuard journey. I started reading the paper and all the docs. Briefly looking into the code, sitting in the IRC channel and joining the mailing list to get a feel for the community.

There is still 1 day left of hackweek, so I hope to spend more time in the code, and maybe, just maybe see if I can fix a bug.. although they don’t seem to have tracker like most projects, so let’s see how that goes.

The community seems pretty cool. The tech, frankly pretty amazing, even I, from a cloud storage background, understood most the paper.

I had set up a tunnel, tcpdumped traffic, used wireshark to look closely at the packets as I read the paper, it was very informative. But I really wanted to get a feel for how this tech could work. They do have a wg-dynamic project which is planning on use wg as a building block to do cooler things, like mesh networking. This sounds cool, so I wanted to sync my teeth in and see how, not wg-dynamic, but see if I could build something similar out of existing OSS tech, and see where the gotchas are, outside of the obviously less secure. It seemed like a good way to better understand the technology.

So on Wednesday, I decided to do just that. Today is Thursday and I’ve gotten to a point where I can say I partially succeeded. And before I delve in deeper and try and figure out my current stumbling block, I thought I’d write down where I am.. and how I got here.. to:

  1. Point the wireguard community at, in case they’re interested.
  2. So you all can follow along at home, because it’s pretty interesting, I think.

As this title suggests, the plan is/was to setup a bunch of tunnels and use FRR to set up some routing protocols up to talk via these tunnels, auto-magically ūüôā

UPDATE: The problem I describe in this post, routes becoming stale, only seems to happen when using RIPv2. When I change it to OSPFv2 all the routes work as expected!! Will write a follow up post to explain the differences.. in fact may rework the notes for it too ūüôā

The problem at hand

Test network VM topology

A picture is worth 1000 words. The basic idea is to simulate a bunch of machines and networks connected over wireguard (WG) tunnels. So I created 6 vms, connected as you can see above.

I used Chris Smart’s ansible-virt-infra project, which is pretty awesome, to build up the VMs and networks as you see above. I’ll leave my build notes as an appendix to this post.

Once I have the infrastructure setup, I build all the tunnels as they are in the image. Then went ahead and installed FRR on all the nodes with tunnels (nodes 1, 2, 4, and 5). To keep things simple, I started with the easiest to configure routing protocol, RIPv2.

Believe it or not, everything seemed to work.. well mostly. I can jump on say node 5 (wireguard-5 if you playing along at home) and:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Looks good right, we see routes for networks 172.16.{0,2,3,4,5}.0/24. Network 1 isn’t there, but hey that’s quite far away, maybe it hasn’t made it yet. Which leads to the real issue.

If I go and run ip r again, soon all these routes will become stale and disappear. Running ip -ts monitor shows just that.

So the question is, what’s happening to the RIP advertisements? And yes they’re still being sent. Then how come some made it to node 5, and never again.

The simple answer is, it was me. The long answer is, I’ve never used FRR before, and it just didn’t seem to be working. So I started debugging the env. To debug, I had a tmux session opened on the KVM host with a tab for each node running FRR. I’d go to each tab and run tcpdump to check to see if the RIP traffic was making it through the tunnel. And almost instantly, I saw traffic, like:

suse@wireguard-5:~> sudo tcpdump -v -U -i wg0 port 520
tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
03:01:00.006408 IP (tos 0xc0, ttl 64, id 62964, offset 0, flags [DF], proto UDP (17), length 52)
10.0.4.105.router > 10.0.4.255.router:
RIPv2, Request, length: 24, routes: 1 or less
AFI 0, 0.0.0.0/0 , tag 0x0000, metric: 16, next-hop: self
03:01:00.007005 IP (tos 0xc0, ttl 64, id 41698, offset 0, flags [DF], proto UDP (17), length 172)
10.0.4.104.router > 10.0.4.105.router:
RIPv2, Response, length: 144, routes: 7 or less
AFI IPv4, 0.0.0.0/0 , tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 10.0.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 10.0.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.0.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 172.16.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.4.0/24, tag 0x0000, metric: 1, next-hop: self

At first I thought it was good timing. I jumped to another host, and when I tcpdumed the RIP packets turned up instantaneously. This happened again and again.. and yes it took me longer then I’d like to admit before it dawned on me.

Why are routes going stale? it seems as though the packets are getting queued/stuck in the WG interface until I poked it with tcpdump!

These RIPv2 Request packet is sent as a broadcast, not directly to the other end of the tunnel. To get it to not be dropped, I had to widen my WG peer allowed-ips from the /32 to a /24.
So now I wonder if broadcast, or just the fact that it’s only 52 bytes, means it gets queued up and not sent through the tunnel, that is until I come along with a hammer and tcpdump the interface?

Maybe one way I could test this is to speed up the RIP broadcasts and hopefully fill a buffer, or see if I can turn WG, or rather the kernel, into debugging mode.

Build notes

As Promised, here are the current form of my build notes, make reference to the topology image I used above.

BTW I’m using OpenSuse Leap 15.1 for all the nodes.

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway ūüôā
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

We’ll be using RIPv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ripd=no/ripd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
network wg1
no passive-interface wg1
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0

network wg1
no passive-interface wg1
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

When this _is_ all working, we’d probably need to open up the allowed-ips on the WG tunnels. We could start by just adding 172.16.0.0/16 to the list. That might allow us to route packet to the other networks.

If you want to go find other routes out to the internet, then we may need 0.0.0.0/0 But not sure how WG will route that as it’s using the allowed-ips and public keys as a routing table. I guess it may not care as we only have a 1:1 mapping on each tunnel and if we can route to the WG interface, it’s pretty straight forward.
This is something I hope to test.

Anther really beneficial test would be to rebuild this environment using IPv6 and see if things work better as we wouldn’t have any broadcasts anymore, only uni and multi-cast.

As well as trying some other routing protocol in general, like OSPF.

Finally, having to continually adjust allowed-ips and seemingly have to either open it up more or add more ranges make me realise why the wg-dynamic project exists, and why they want to come up with a secure routing protocol to use through the tunnels, to do something similar. So let’s keep an eye on that project.

Keystone Federated Swift ‚Äď Multi-region cluster, multiple federation, access same account

Welcome to the final post in the series, it has been a long time coming. If required/requested I’m happy to delve into any of these topics deeper, but I’ll attempt to explain the situation, the best approach to take and how I got a POC working, which I am calling the brittle method. It definitely isn’t the best approach but as it was solely done on the Swift side and as I am a OpenStack Swift dev it was the quickest and easiest for me when preparing for the presentation.

To first understand how we can build a federated environment where we have access to our account no matter where we go, we need to learn about how keystone authentication works from a Swift perspective. Then we can look at how we can solve the problem.

Swift’s Keystoneauth middleware

As mentioned in earlier posts, there isn’t any magic in the way Swift authentication works. Swift is an end-to-end storage solution and so authentication is handled via authentication middlewares. further a single Swift cluster can talk to multiple auth backends, which is where the `reseller_prefix` comes into play. This was the first approach I blogged about in these series.

 

There is nothing magical about how authentication works, keystoneauth¬†has it’s own idiosyncrasies, but in general it simply makes a decision whether this request should be allowed. It makes writing your own simple, and maybe an easily way around the problem. Ie. write an auth middleware to auth directly to your existing company LDAP server or authentication system.

 

To setup keystone authentication, you use keystones authtoken middleware and directly afterwards in the pipeline place the Swift keystone middleware, configuring each of them in the proxy configuration:

pipeline = ... authtoken keystoneauth ... proxy-server

The authtoken middleware

Generally every request to Swift will include a token, unless it’s using tempurl, container-sync or to a container that has global read enabled but you get the point.

As the swift-proxy is a python wsgi app the request first hits the first middleware in the pipeline¬†(left most) and works it’s way to the right. When it hits the authtoken middleware the token in the request will be sent to keystone to be authenticated.

The resulting metadata, ie the user, storage_url, groups, roles etc, and dumped into the request environment and then passed to the next middleware. The keystoneauth middleware.

The keystoneauth middleware

The keystoneauth middleware checks the request environment for the metadata dumped by the authtoken middleware and makes a decision based on that. Things like:

  • If the token was one for one of the reseller_admin roles, then they have access.
  • If the user isn’t a swift user of the account/project the request is for, is there an ACL that will allow it.
  • If the user has a role that identifies them as a swift user/operator of this Swift account then great.

 

When checking to see if the user has access to the given account (Swift account) it needs to know what account the request is for. This is easily determined as it’s defined by the path of the URL your hitting. The URL you send to the Swift proxy is what we call the storage url. And is in the form of:

http(s)://<url of proxy or proxy vip>/v1/<account>/<container>/<object>

The container and object elements are optional as it depends on what your trying to do in Swift. When the keystoneauth middleware is authenticating it’ll check that the project_id¬†(or tenant_id) metadata dumped by authtoken, when this is concatenated with the reseller_prefix, matches the account in the given storage_url. For example let’s say the following metadata was dumped by authtoken:

{
"X_PROJECT_ID": 'abcdefg12345678',
"X_ROLES": "swiftoperator",
...
}

And the reseller_prefix for keystone auth was AUTH_ and we make any member of the swiftoperator role (in keystone) a swift operator (a swift user on the account). Then keystoneauth would allow access if the account in the storage URL matched AUTH_abcdefg12345678.

 

When you authenticate to keystone the object storage endpoint will point not only to the Swift endpoint (the swift proxy or swift proxy load balancer), but it will also include your account. Based on your project_id. More on this soon.

 

Does that make sense? So simply put to use keystoneauth in a multi federated environment, we just need to make sure no matter which keystone we end up using and asking for the swift endpoint always returns the same Swift account name.

And there lies our problem, the keystone object storage endpoint and the metadata authtoken¬†dumps uses the project_id/tenant_id. This isn’t something that is synced or can be passed via federation metadata.

NOTE: This also means that you’d need to use the same reseller_prefix¬†on all keystones in every federated environment. Otherwise the accounts wont match.

 

Keystone Endpoint and Federation Side

When you add an object storage endpoint in keystone, for swift, the url looks something like:

http://swiftproxy:8080/v1/AUTH_$(tenant_id)s

 

Notice the $(tenant_id)s at the end? This is a placeholder that keystone internally will replace with the tenant_id of the project you authenticated as. $(project_id)s can also be used and maps to the same thing. And this is our problem.

When setting up federation between keystones (assuming keystone 2 keystone federation) you generate a mapping. This mapping can include the project name, but not the project_id. Theses ids are auto-generated, not deterministic by name, so creating the same project on different federated keystone servers will have different project_id‘s. When a keystone service provider (SP) federates with a keystone identity provider (IdP) the mapping they share shows how the provider should map federated users locally. This includes creating a shadow project if a project doesn’t already exist for the federated user to be part of.

Because there is no way to sync project_id’s in the mapping the SP will create the project which will have a unique project_id. Meaning when the federated user has authenticated their Swift storage endpoint from keystone will be different, in essence as far as Swift is concerned they will have access but to a completely different Swift account. Let’s use an example, let’s say there is a project on the IdP called ProjectA.

           project_name        project_id
  IdP      ProjectA            75294565521b4d4e8dc7ce77a25fa14b
  SP       ProjectA            cb0d5805d72a4f2a89ff260b15629799

Here we have a ProjectA on both IdP and SP. The one on the SP would be considered a shadow project to map the federated user too. However the project_id’s are both different, because they are uniquely¬† generated when the project is created on each keystone environment. Taking the Object Storage endpoint in keystone as our example before we get:

 

          Object Storage Endpoint
  IdP     http://swiftproxy:8080/v1/AUTH_75294565521b4d4e8dc7ce77a25fa14b
  SP      http://swiftproxy:8080/v1/AUTH_cb0d5805d72a4f2a89ff260b15629799

So when talking to Swift you’ll be accessing different accounts, AUTH_75294565521b4d4e8dc7ce77a25fa14b and AUTH_cb0d5805d72a4f2a89ff260b15629799 respectively. This means objects you write in one federated environment will be placed in a completely different account so you wont be able access them from elsewhere.

 

Interesting ways to approach the problem

Like I stated earlier the solution would simply be to always be able to return the same storage URL no matter which federated environment you authenticate to. But how?

  1. Make sure the same project_id/tenant_id is used for _every_ project with the same name, or at least the same name in the domains that federation mapping maps too. This means direct DB hacking, so not a good solution, we should solve this in code, not make OPs go hack databases.
  2. Have a unique id for projects/tenants that can be synced in federation mapping, also make this available in the keystone endpoint template mapping, so there is a consistent Swift account to use. Hey we already have project_id which meets all the criteria except mapping, so that would be easiest and best.
  3. Use something that _can_ be synced in a federation mapping. Like domain and project name. Except these don’t map to endpoint template mappings. But with a bit of hacking that should be fine.

Of the above approaches, 2 would be the best. 3 is good except if you pick something mutable like the project name, if you ever change it, you’d now authenticate to a completely different swift account. Meaning you’d have just lost access to all your old objects! And you may find yourself with grumpy Swift Ops who now need to do a potentially large data migration or you’d be forced to never change your project name.

Option 2 being unique, though it doesn’t look like a very memorable name if your using the project id, wont change. Maybe you could offer people a more memorable immutable project property to use. But to keep the change simple being able simply sync the project_id should get us everything we need.

 

When I was playing with this, it was for a presentation so had a time limit, a very strict one, so being a Swift developer and knowing the Swift code base I hacked together a varient on option 3 that didn’t involve hacking keystone at all. Why, because I needed a POC and didn’t want to spend most my time figuring out the inner workings of Keystone, when I could just do a few hacks to have a complete Swift only version. And it worked. Though I wouldn’t recommend it. Option 3 is very brittle.

 

The brittle method – Swift only side – Option 3b

Because I didn’t have time to simply hack keystone, I took a different approach. The basic idea was to let authtoken authenticate and then finish building the storage URL on the swift side using the meta-data authtoken¬†dumps into wsgi request env. Thereby modifying the way keystoneauth authenticates slightly.

Step 1 – Give the keystoneauth middleware the ability to complete the storage url

By default we assume the incoming request will point to a complete account, meaning the object storage endpoint in keystone will end with something like:

'<uri>/v1/AUTH_%(tenant_id)s'

So let’s enhance keystoneauth¬†to have the ability to if given only the reseller_prefix to complete the account. So I added a use_dynamic_reseller¬†option.

If you enable use_dynamic_reseller then the keystoneauth middleware will pull the project_id from authtoken‘s meta-data dumped in the wsgi environment. This allows a simplified keystone endpoint in the form:

'<uri>/v1/AUTH_'

This shortcut makes configuration easier, but can only be reliably used when on your own account and providing a token. API elements like tempurl  and public containers need the full account in the path.

This still used project_id¬†so doesn’t solve our problem, but meant I could get rid of the $(tenant_id)s from the endpoints. Here is the commit in my github fork.

Step 2 – Extend the dynamic reseller to include completing storage url with names

Next, we extend the keystoneauth middleware a little bit more. Give it another option, use_dynamic_reseller_name, to complete the account with either project_name or domain_name and project_name but only if your using keystone authentication version 3.

If you are, and want to have an account based of the name of the project, then you can enable use_dynamic_reseller_name in conjuction with use_dynamic_reseller to do so. The form used for the account would be:

<reseller_prefix><project_domain_name>_<project_name>

So using our example previously with a reseller_preix of AUTH_, a project_domain_name of Domain and our project name of ProjectA, this would generate an account:

AUTH_Domain_ProjectA

This patch is also in my github fork.

Does this work, yes! But as I’ve already mentioned in the last section, this is _very_ brittle. But this also makes it confusing to know when you need to provide only the reseller_prefix or your full account name. It would be so much easier to just extend keystone to sync and create shadow projects with the same project_id. Then everything would just work without hacking.

Monasca + Swift: Sending all your Swift metrics Monasca’s way

Last week was SUSE Hackweek. A week every employee is given to go have fun hacking something or learning something they find interesting. It’s an awesome annual event that SUSE runs. It’s my second and I love it.

While being snowed in in Dublin at the Dublin PTG a while ago I chatted with Johannes, a monasca dev and very intelligent team mate at SUSE. And I heard that Monasca has a statsd endpoint as a part of the monasca agent you can fire stats at. As a Swift developer this interests me greatly. Every Swift daemon dumps a plethora of statsd metrics. So can I put the 2 together? Can I simply install monasca-agent to each storage and proxy node and then point the statsd endpoints for all swift services locally?

 

I started the week attempting to do just that. Because I’m new to monasca, and didn’t want to go attempt to set it up, I just run a devsack + SAIO environment.

The devstack was just a simple monasa + keystone + horizon configuration and the SAIO was a standard Swift All In One.

 

Next I installed the monasca-agent to the SAIO and then updated Swift to point at it. In Swift each config supports a statsd server endpoint configuration options:

 

# You can enable StatsD logging here:
# log_statsd_host =
# log_statsd_port = 8125
# log_statsd_default_sample_rate = 1.0
# log_statsd_sample_rate_factor = 1.0
# log_statsd_metric_prefix =

 

So pointing swift is easy. I then uploaded as few objects to swift and bingo, inside Monasca’s influxdb instance I can see the Swift measurements.

 

account-auditor.passes
account-auditor.timing
account-replicator.attempts
account-replicator.no_changes
account-replicator.successes
account-replicator.timing
account-server.GET.timing
account-server.HEAD.timing
account-server.PUT.timing
account-server.REPLICATE.timing
container-auditor.passes
container-auditor.timing
container-replicator.attempts
container-replicator.no_changes [41/49393]
container-replicator.successes
container-replicator.timing
container-server.GET.timing
container-server.PUT.timing
container-server.REPLICATE.timing
container-updater.no_changes
container-updater.successes
container-updater.timing
monasca.collection_time_sec
monasca.thread_count
object-auditor.timing
object-replicator.partition.update.count.sdb1
object-replicator.partition.update.count.sdb2
object-replicator.partition.update.count.sdb3
object-replicator.partition.update.count.sdb4
object-replicator.partition.update.timing
object-replicator.suffix.hashes
object-server.HEAD.timing
object-server.PUT.sdb1.timing
object-server.PUT.sdb2.timing
object-server.PUT.sdb3.timing
object-server.PUT.sdb4.timing
object-server.PUT.timing
object-server.REPLICATE.timing
object-updater.timing
proxy-server.account.GET.200.first-byte.timing
proxy-server.account.GET.200.timing
proxy-server.account.GET.200.xfer
proxy-server.object.HEAD.404.timing
proxy-server.object.HEAD.404.xfer
proxy-server.object.PUT.201.timing
proxy-server.object.PUT.201.xfer
proxy-server.object.policy.1.HEAD.404.timing
proxy-server.object.policy.1.HEAD.404.xfer
proxy-server.object.policy.1.PUT.201.timing
proxy-server.object.policy.1.PUT.201.xfer

 

NOTE: This isn’t the complete list, as the measures are added when new metrics are fired, and the SAIO is a small healthy swift cluster, so there isn’t many 500 series errors etc. But it works!

 

And better yet I have access to them in grafana via the monasca datasource!

 

swift_recon check plugin

I thought that was easy, but Swift actually provides more metrics then just that. Swift has a reconnaissance API (recon) on all the wsgi servers (account, container and object servers). That you can hit either via REST or the swift-recon tool. So next I thought I wonder how hard it would be to write a swift_recon check plugin for Monasca.

Some of the recon metrics you can get aren’t really grafana friendly. But some would be awesome to have in the same place and closer to horizon where ops are looking.

 

So I went and wrote one. Like I said I couldn’t get all the metrics, but I got most:

 

swift_recon.account.account_auditor_pass_completed [2/49393]
swift_recon.account.account_audits_failed
swift_recon.account.account_audits_passed
swift_recon.account.account_audits_since
swift_recon.account.attempted
swift_recon.account.failure
swift_recon.account.replication_last
swift_recon.account.replication_time
swift_recon.account.success
swift_recon.container.attempted
swift_recon.container.container_auditor_pass_completed
swift_recon.container.container_audits_failed
swift_recon.container.container_audits_passed
swift_recon.container.container_audits_since
swift_recon.container.container_updater_sweep
swift_recon.container.failure
swift_recon.container.replication_last
swift_recon.container.replication_time
swift_recon.container.success
swift_recon.disk_usage.mounted
swift_recon.object.async_pending
swift_recon.object.attempted
swift_recon.object.auditor.object_auditor_stats_ALL.audit_time
swift_recon.object.auditor.object_auditor_stats_ALL.bytes_processed
swift_recon.object.auditor.object_auditor_stats_ALL.errors
swift_recon.object.auditor.object_auditor_stats_ALL.passes
swift_recon.object.auditor.object_auditor_stats_ALL.quarantined
swift_recon.object.auditor.object_auditor_stats_ALL.start_time
swift_recon.object.auditor.object_auditor_stats_ZBF.audit_time
swift_recon.object.auditor.object_auditor_stats_ZBF.bytes_processed
swift_recon.object.auditor.object_auditor_stats_ZBF.errors
swift_recon.object.auditor.object_auditor_stats_ZBF.passes
swift_recon.object.auditor.object_auditor_stats_ZBF.quarantined
swift_recon.object.auditor.object_auditor_stats_ZBF.start_time
swift_recon.object.expirer.expired_last_pass
swift_recon.object.expirer.object_expiration_pass
swift_recon.object.failure
swift_recon.object.object_updater_sweep
swift_recon.object.replication_last
swift_recon.object.replication_time
swift_recon.object.success
swift_recon.quarantined
swift_recon.unmounted

 

Some of the metric names might need to tidy up, but so far, so good. Some of the really interesting metrics Swift Ops usually want to keep an eye on is when have all the replicators completed a cycle. Why? Well one example is while ring rebalancing on a large and busy cluster you want to avoid too much data movement, so when adding new drives you will rise their weights slowly. But you also want to make sure a complete replication cycle is complete before you rebalance again. So knowing when you pushed a new ring out and the timestamps of the last run replication tells you when it’s safe. These are coming through nicely:

 

 

Unfortunately there are some metrics I can’t quite get though. You can use recon to get md5s of the rings and configs on each node. But I found md5s can’t get pushed through. You can also ask recon what version of swift is installed on each node (nice is a large deployment and when upgrading). But the version number also had issues. Both of these are probably not insurmountable, but I still need to figure out how.

 

swift_handoffs check plugin

I’ve been involved in the Swift community for quite a while now, and I’d had heard of another awesome metric one of the Swiftstack cores came out with to give an awesome visualisation of the Swift cluster. He even provided a gist to the community others would use and adapt. I thought, why not make sure everyone could use it, lets add it as another check plugin to the monasca agent.

 

Everything in Swift is treated as an object, and an object has a number of devices in the cluster who are considered primary (who store that object). When a drive gets full or there is too much load on say an object PUT, if a primary is unavailable to meet the durability contract another node will store the object (this node would be called a handoff for that object), the handoff node will push the handoff object to the primary as soon as it can (drive is replaced, or comes back online, etc).

Further, a ring in Swift is divided into logical segments called partitions. And it’s these partitions that devices are responsible for storing (or think of it as, it has to store all objects that belong to a partition). When we rebalance the ring, either by adding or removing drives or changing weights, these partitions shift around the cluster. Either to say drain a drive or to move to where where is more space. Swift is really good as minimising this movement to the minimum. So after a rebalance, nodes that used to be primaries for some partitions wont be anymore. They’ll suddenly be handoffs, and the back-end consistency engine will move them to their new home.

So what’s interesting to note there is, it all involves handoff partitions.

 

Turns out, by just watching the number of partitions vs the number of handoffs on each storage node gives you a great health indicator. When should I do a rebalance? when the handoffs numbers are down. There seem to be a build up of handoffs in a region, maybe write affinity and WAN links are saturated or there is some network/disk/server issue on one of the nodes around there etc.

Here are the metrics:

 

swift_handoffs.handoffs
swift_handoffs.primary

 

And here is a simplified snapshot. This is my SAIO with 4 simulated nodes. This is watching the storage nodes as a whole but you can break down to the drive. There is a graph for each node and each Swift ring. This rise in handoffs (Object – Policy 0 SAIO[1-3]) is due to me turning of the consistency engine and then changing the weight back to a nicely weighted cluster:

See Object - Policy 0. SAIO0’s weight has increased, so the other nodes now have handoff partitions to give him. If I now went and turned the consistency engine back on, you’d see more primary nodes on SAIO0.

 

Wheres the code

UPDATE: I’ve now pushed up the checks to monasca. They can be found here:

  • https://review.openstack.org/#/c/583876/
  • https://review.openstack.org/#/c/585067/

Setting up a basic keystone for Swift + Keystone dev work

As a Swift developer, most of the development works in a Swift All In One (SAIO) environment. This environment simulates a mulinode swift cluster on one box. All the SAIO documentation points to using tempauth for authentication. Why?

Because most the time authentication isn’t the things we are working on. Swift has many moving parts, and so tempauth, which only exists for testing swift and is configured in the proxy.conf file works great.

However, there are times you need to debug or test keystone + swift integration. In this case, we tend to build up a devstack for keystone component. But if all we need is keystone, then can we just throw one up on a SAIO?… yes. So this is how I do it.

Firstly, I’m going to be assuming you have SAIO already setup. If not go do that first. not that it really matters, as we only configure the SAIO keystone component at the end. But I will be making keystone listen on localhost, so if you are doing this on anther machine, you’ll have to change that.

Further, this will set up a keystone server in the form you’d expect from a real deploy (setting up the admin and public interfaces).

 

Step 1 – Get the source, install and start keystone

Clone the sourcecode:
cd $HOME
git clone https://github.com/openstack/keystone.git

Setup a virtualenv (optional):
mkdir -p ~/venv/keystone
virtualenv ~/venv/keystone
source ~/venv/keystone/bin/activate

Install keystone:
cd $HOME/keystone
pip install -r requirements.txt
pip install -e .
cp etc/keystone.conf.sample etc/keystone.conf

Note: We are running the services from the source so config exists in source etc.

 

The fernet keys seems to assume a full /etc path, so we’ll create it. Maybe I should update this to put all config there but for now meh:
sudo mkdir -p /etc/keystone/fernet-keys/
sudo chown $USER -R /etc/keystone/

Setup the database and fernet:
keystone-manage db_sync
keystone-manage fernet_setup

Finally we can start keystone. Keystone is a wsgi application and so needs a server to pass it requests. The current keystone developer documentation seems to recommend uwsgi, so lets do that.

 

First we need uwsgi and the python plugin, one a debian/ubuntu system you:
sudo apt-get install uwsgi uwsgi-plugin-python

Then we can start keystone, by starting the admin and public wsgi servers:
uwsgi --http 127.0.0.1:35357 --wsgi-file $(which keystone-wsgi-admin) &
uwsgi --http 127.0.0.1:5000 --wsgi-file $(which keystone-wsgi-public) &

Note: Here I am just backgrounding them, you could run then in tmux or screen, or setup uwsgi to run them all the time. But that’s out of scope for this.

 

Now a netstat should show that keystone is listening on port 35357 and 5000:
$ netstat -ntlp | egrep '35357|5000'
tcp 0 0 127.0.0.1:5000 0.0.0.0:* LISTEN 26916/uwsgi
tcp 0 0 127.0.0.1:35357 0.0.0.0:* LISTEN 26841/uwsgi

Step 2 – Setting up keystone for swift

Now that we have keystone started, its time to configure it. Firstly you need the openstack client to configure it so:
pip install python-openstackclient

Next we’ll use all keystone defaults, so we only need to pick an admin password. For the sake of this how-to I’ll pick the developer documentation example of `s3cr3t`. Be sure to change this. So we can do a basic keystone bootstrap with:
keystone-manage bootstrap --bootstrap-password s3cr3t

Now we just need to set up some openstack env variables so we can use the openstack client to finish the setup. To make it easy to access I’ll dump them into a file you can source. But feel free to dump these in your bashrc or whatever:
cat > ~/keystone.env <<EOF
export OS_USERNAME=admin
export OS_PASSWORD=s3cr3t
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_ID=default
export OS_PROJECT_DOMAIN_ID=default
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_URL=http://localhost:5000/v3
EOF


source ~/keystone.env

 

Great, now ¬†we can finish configuring keystone. Let’s first setup a service project (tennent) for our Swift cluster:
openstack project create service

Create a user for the cluster to auth as when checking user tokens and add the user to the service project, again we need to pick a password for this user so `Sekr3tPass` will do.. don’t forget to change it:
openstack user create swift --password Sekr3tPass --project service
openstack role add admin --project service --user swift

Now we will create the object-store (swift) service and add the endpoints for the service catelog:
openstack service create object-store --name swift --description "Swift Service"
openstack endpoint create swift public "http://localhost:8080/v1/AUTH_\$(tenant_id)s"
openstack endpoint create swift internal "http://localhost:8080/v1/AUTH_\$(tenant_id)s"

Note: We need to define the reseller_prefix we want to use in Swift. If you change it in Swift, make sure you update it here.

 

Now we can add roles that will match to roles in Swift, namely an operator (someone who will get a Swift account) and reseller_admins:
openstack role create SwiftOperator
openstack role create ResellerAdmin

Step 3 – Setup some keystone users to auth as.

TODO: create all the tempauth users here

 

Here, it would make sense to create the tempauth users devs are used to using, but I’ll just go create a user so you know how to do it. First create a project (tennent) for this example demo:
openstack project create --domain default --description "Demo Project" demo

Create a user:
openstack user create --domain default --password-prompt matt

We’ll also go create a basic user role:
openstack role create user

Now connect the 3 pieces together by adding user matt to the demo project with the user role:
openstack role add --project demo --user matt user

If you wanted user¬†matt¬†to be a swift operator (have an account) you’d:
openstack role add --project demo --user matt SwiftOperator

or even a reseller_admin:
openstack role add --project demo --user matt ResellerAdmin

If your in a virtual env, you can leave it now, because next we’re going to go back to your already setup swift to do the Swift -> Keystone part:
deactivate

Step 4 – Configure Swift

To get swift to talk to keystone we need to add 2 middlewares to the proxy pipeline. And in the case of a SAIO, remove the tempauth middleware. But before we do that we need to install the keystonemiddleware to get one of the 2 middlware’s, keystone’s authtoken:
sudo pip install keystonemiddleware

Now you want to replace your tempauth middleware in the proxy path pipeline with authtoken keystoneauth so it looks something like:
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache bulk tempurl ratelimit crossdomain container_sync authtoken keystoneauth staticweb copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server

Then in the same ‘proxy-server.conf’ file you need to add the paste filter sections for both of these new middlewares:
[filter:authtoken]
paste.filter_factory = keystonemiddleware.auth_token:filter_factory
auth_host = localhost
auth_port = 35357
auth_protocol = http
auth_uri = http://localhost:5000/
admin_tenant_name = service
admin_user = swift
admin_password = Sekr3tPass
delay_auth_decision = True
# cache = swift.cache
# include_service_catalog = False

[filter:keystoneauth]
use = egg:swift#keystoneauth
# reseller_prefix = AUTH
operator_roles = admin, SwiftOperator
reseller_admin_role = ResellerAdmin

Note: You need to make sure if you change the reseller_prefix here, you change it in keystone. And notice this is where you map operator_roles and reseller_admin_role in swift to that in keystone. Here anyone in with the keystone role admin or SwiftOperator are swift operators and those with the ResellerAdmin role are reseller_admins.

 

And that’s it. Now you should be able to restart your swift proxy and it’ll go off and talk to keystone.

 

You can use your Python swiftclient now to go talk, and whats better swiftclient understands the OS_* variables, so you can just source your keystone.env and talk to your cluster (to be admin) or export some new envs for the user you’ve created. If you want to use curl you can. But _much_ easier to use swiftclient.

 

Tip: You can use: swift auth to get the auth_token if you want to then use curl.

 

If you want to authenticate via curl then for v3, use: https://docs.openstack.org/developer/keystone/devref/api_curl_examples.html

 

Or for v2, I use:
url="http://localhost:5000/v2.0/tokens"
auth='{"auth": {"tenantName": "demo", "passwordCredentials": {"username": "matt", "password": ""}}}'

 

curl -s -d "$auth" -H 'Content-type: application/json' $url |python -m json.tool

 

or

curl -s -d "$auth" -H 'Content-type: application/json' $url |python -c "import sys, json; print json.load(sys.stdin)['access']['token']['id']"

To just print out the token. Although a simple swift auth would do all this for you.

pudb debugging tips

As an OpenStack Swift dev I obviously write a lot of Python. Further Swift is cluster and so it has a bunch of moving pieces. So debugging is very important. Most the time I use pudb and then jump into the PyCharms debugger if get really stuck.

Pudb is curses based version of pdb, and I find it pretty awesome and you can use it while ssh’d somewhere. So I thought I’d write a tips that I use. Mainly so I don’t forget ūüôā

The first and easiest way to run pudb is use pudb as the python runner.. i.e:

pudb <python script>

On first run, it’ll start with the preferences window up. If you want to change preferences you can just hit ‘<ctrl>+p’. However you don’t need to remember that, as hitting ‘?’ will give you a nice help screen.

I prefer to see line numbers, I like the dark vim theme and best part of all, I prefer my interactive python shell to be ipython.

While your debugging, like in pdb, there are some simple commands:

  • n – step over (“next”)
  • s – step into
  • c – continue
  • r/f – finish current function
  • t – run to cursor
  • o – show console/output screen
  • b – toggle breakpoint
  • m – open module
  • ! – Jump into interactive shell (most useful)
  • / – text search

There are obviously more then that, but they are what I mostly use. The open module is great if you need to set a breakpoint somewhere deeper in the code base, so you can open it, set a breakpoint and then happily press ‘c’ to continue until it hits. The ‘!’ is the most useful, it’ll jump you into an interactive python shell in the exact point¬†the debugger is at. So you can jump around, check/change settings and poke in areas to see whats happening.

As with pdb you can also use code to insert a breakpoint so pudb will be triggered rather then having to start a script with pudb. I give an example of how in the nosetest section below.

nosetests + pudb

Sometimes the best way to use pudb is to debug unit tests, or even write a unit (or functaional or probe) test to get you into an area you want to test. You can use pudb to debug these too. And there are 2 ways to do it.

The first way is by installing the ‘nose-pudb’ pip package:

pip install nose-pudb

Now when you run nosetests you can add the –pudb option and it’ll break into pudb if there is an error, so you go poke around in ‘post-mortem’ mode. This is really useful, but doesn’t allow you to actually trace the tests as they run.

So the other way of using pudb in nosetests is actually insert some code in the test that will trigger as a breakpoint and start up pudb. To do so is exactly how you would with pdb, except substitute for pudb. So just add the following line of code to your test where you want to drop into pudb:

import pudb; pudb.set_trace()

And that’s it.. well mostly, because pudb is command line you need to tell nosetests to not capture stdout with the ‘-s’ flag:

nosetests -s test/unit/common/middleware/test_cname_lookup.py

testr + pudb

Not problem here, it uses the same approach as above. Where you programmatically set a trace, as you would for pdb. Just follow the ¬†‘Debugging (pdb) Tests’ section on this page¬†(except substitute pdb for pudb)

 

Update – run_until_failure.sh

I’ve been trying to find some intermittent unit test failures recently. So I whipped up ¬†a quick bash script that I run in a tmux session that really helps find and deal with them, I thought I’d add to this post as I then can add nose-pudb to make it pretty useful.

#!/bin/bash

n=0
while [ True ]
do 
  clear
  $@
  if [ $? -gt 0 ]
  then 
    echo 'ERROR'
    echo "number " $n
    break
  fi
  let "n=n+1"
  sleep 1
done

With this I can simply:
run_until_failure.sh tox -epy27

 

It’ll stop looping once the command passed returns something other then 0.

Once I have an error, I have then been focusing in on the area it happens (to speed up the search a bit), I can also use nose-pudb to drop me into post-mortem mode so I can poke around in ipython, for example, I’m currently running:

 

run_until_failure.sh nosetests --pudb test/unit/proxy/test_server.py

 

Then I can come back to the tmux session, if I’m dropped in a pudb interface, I can go poke around.

Swift Container sharding – locked db POC – Benchmarking observations

The latest POC is at the benchmarking stage, and in the most part it’s going well. I have set up 2 clusters in the cloud, not huge, but 2 proxies and 4 storage nodes each. A benchmarking run involves pointing an ssbench master at each cluster and putting each cluster under load. In both cases we only use 1 container, and on one cluster this container will have sharding turned on.

So far it’s looking pretty good. I’ve done many runs, and usually find a bug at scale.. but as of recently I’ve done two runs of the latest revision alternating the sharded cluster (the cluster that will be benchmarking with the container with sharding on). Below shows the grafana statsd output of the second run. Note that cluster 2 is the sharded cluster in this run:

2016-12-22-0928_cluster2_run2_smaller

Looking at the picture there are a few observations we can make, the peaks in the ‘Container PUT Latency – Cluster 2’ correspond when a container is sharded (in this case, the one container and then shards sharding).

As I mentioned earlier ssbench is running the benchmark and the benchmark is very write (PUT) heavy. Here is the sharding scenario file:

{
  "name": "Sharding scenario",
  "sizes": [{
    "name": "zero",
    "size_min": 0,
    "size_max": 0
  }],
  "initial_files": {
    "zero": 100
  },
  "run_seconds": 86400,
  "crud_profile": [200, 50, 0, 5],
  "user_count": 2,
  "container_base": "shardme",
  "container_count": 1,
  "container_concurrency": 1,
  "container_put_headers": {
  "X-Container-Sharding": "on"
  }
}

The only difference with this and non-sharding one is not setting the X-Container-Sharding meta on the initial container PUT. The crud profile shows that we are heady on PUTs and GETs. But because jobs are randomised, I don’t expect the exact the same numbers when it comes to object count on the servers however there is a rather large discrepancy with the object counts¬†on both servers:

Cluster 1:

HTTP/1.1 204 No Content
Content-Length: 0
X-Container-Object-Count: 11291190
Accept-Ranges: bytes
X-Storage-Policy: gold
X-Container-Bytes-Used: 0
X-Timestamp: 1482290574.52856
Content-Type: text/plain; charset=utf-8
X-Trans-Id: tx9dd499df28304b2d920aa-00585b2d3e
Date: Thu, 22 Dec 2016 01:32:46 GMT

Cluster 2:

Content-Length: 0
X-Container-Object-Count: 6909895
X-Container-Sharding: True
X-Storage-Policy: gold
X-Container-Bytes-Used: 0
X-Timestamp: 1482290575.94012
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
X-Trans-Id: txba7b23743e0d45a68edb8-00585b2d61
Date: Thu, 22 Dec 2016 01:33:27 GMT

So cluster 1 has about 11 million objects and cluster 2 about 7 million. That quite a difference. Which gets me wondering what’s causing such a large difference in PUT through put?

The only real difference in the proxy object PUT when comparing sharded to unsharded is the finding of the shard container the object server will need to update, in which case another request is made to the root container asking for the pivot (if there is one). Is this extra request really causing an issue? I do note the object-updater (last graph in the image) is also working harder, as the number of successes during the benchmarks are much higher, meaning there are more requests falling into async pendings.

Maybe the extra updater work is because of the extra load on the container server (this additional request)?

To test this theory, I can push the sharder harder and force container updates into the root container. This would stop the extra request.. but force more traffic to the root container (which we are kinda doing anyway). We should still see benefits as root container would be much smaller (because it’s sharded) then the non sharded counter part. And this will allow us to see if this is causing the slower through put.

Update: I’m currently running a new scenario which is all PUTs so lets see how that fairs. Will keep you posted.

Simple Squid access log reporting.

Squid is one of the biggest and most used proxies on the interwebs. And generating reports from the access logs is already a done deal, there are many commercial and OSS apps that support the squid log format. But I found my self in a situation where I wanted stats but didn’t want to install a web server on my proxy or use syslog to push my logs to a centralised server which was running such software, and also wasn’t in a position to go buy one of those off the shelf amazing wiz bang Squid reporting and graphing tools.

As a Linux geek I surfed the web to see what others have done. I came across a list provided by the Squid website. Following a couple of links, I came across a awk script called ‘proxy_stats.gawk’ written by Richard Huveneers.

I downloaded it and tried it out… unfortunately it didn’t work, looking at the code.. which he nicely commented showed that he had it set up for access logs¬† from version 1.* of squid. Now the squid access log format from squid 2.6+ hasn’t changed too much from version 1.1. all they have really done is add a “content type” entry at the end of each line.

So as a good Linux geek does, he upgrades the script, my changes include:

  • Support for squid 2.6+
  • Removed the use a deprecated switches that now isn’t supported in the sort command.
  • Now that there is a an actual content type “column” lets use it to improve the ‘Object type report”.
  • Add a users section, as this was an important report I required which was missing.
  • And in a further hacked version, an auto generated size of the first “name” column.

Now with the explanation out of the way, let me show you it!

For those who are new to awk, this is how I’ve been running it:

zcat <access log file> | awk -f proxy_stats.gawk > <report-filename>

NOTE: I’ve been using it for some historical analysis, so I’m running it on old rotated files, which are compressed thus the zcat.

You can pass more then one file at a time and it order doesn’t matter, as each line of an access log contains the date in epoch time:

zcat `find /var/log/squid/ -name "access.log*"` |awk -f proxy_stats.gawk

The script produces an ascii report (See end of blog entry for example), which could be generated and emailed via cron. If you want it to look nice in any email client using html the I suggest wrapping it in <pre> tags.:

<html>
<head><title>Report Title</title></head>
Report title<body>
<pre>
... Report goes here ...
</pre>
</body>
</html>

For those experienced Linux sys admins out there using cron + ‘find -mtime’ would be a very simple way of having an automated daily, weekly or even monthly report.
But like I said earlier I was working on historic data, hundreds of files in a single report, hundreds because for business reasons we have been rotating the squid logs every hour… so I did what I do best, write a quick bash script to find all the files I needed to cat into the report:

#!/bin/bash

ACCESS_LOG_DIR="/var/log/squid/access.log*"
MONTH="$1"

function getFirstLine() {
	if [ -n  "`echo $1 |grep "gz$"`" ]
	then
		zcat $1 |head -n 1
	else
		head -n 1 $1 
	fi
}

function getLastLine() {
	if [ -n  "`echo $1 |grep "gz$"`" ]
	then
		zcat $1 |tail -n 1
	else
		tail -n 1 $1 
	fi
}

for log in `ls $ACCESS_LOG_DIR`
do
	firstLine="`getFirstLine $log`"
	epochStr="`echo $firstLine |awk '{print $1}'`"
	month=`date -d @$epochStr +%m`
	
	if [ "$month" -eq "$MONTH" ]
	then
		echo $log
		continue
	fi

	
	#Check the last line
	lastLine="`getLastLine $log`"
	epochStr="`echo $lastLine |awk '{print $1}'`"
        month=`date -d @$epochStr +%m`

        if [ "$month" -eq "$MONTH" ]
        then
                echo $log
        fi
	
done

So there you go, thanks to the work of Richard Huveneers there is a script that I think generates a pretty good acsii report, which can be automated or integrated easily into any Linux/Unix work flow.

If you interested in getting hold of the most up to date version of the script you can get it from my sysadmin github repo here.

As promised earlier here is an example report:

Parsed lines  : 32960
Bad lines     : 0

First request : Mon 30 Jan 2012 12:06:43 EST
Last request  : Thu 09 Feb 2012 09:05:01 EST
Number of days: 9.9

Top 10 sites by xfers           reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
213.174.155.216                   20   0.1% 100.0%   0.0%        0.0   0.0%   0.0%       1.7       2.5
30.media.tumblr.com                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      48.3      77.4
28.media.tumblr.com                1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      87.1       1.4
26.media.tumblr.com                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
25.media.tumblr.com                2   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      49.2      47.0
24.media.tumblr.com                1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%     106.4     181.0
10.1.10.217                      198   0.6% 100.0%   0.0%       16.9   0.9%   0.0%      87.2    3332.8
3.s3.envato.com                   11   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       7.6      18.3
2.s3.envato.com                   15   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       7.5      27.1
2.media.dorkly.cvcdn.com           8   0.0% 100.0%  25.0%        3.2   0.2%   0.3%     414.1     120.5

Top 10 sites by MB              reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
zulu.tweetmeme.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       3.1     289.6
ubuntu.unix.com                    8   0.0% 100.0% 100.0%        0.1   0.0% 100.0%       7.5     320.0
static02.linkedin.com              1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      36.0     901.0
solaris.unix.com                   2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       3.8     223.6
platform.tumblr.com                2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       1.1     441.4
i.techrepublic.com.com             5   0.0%  60.0% 100.0%        0.0   0.0% 100.0%       6.8    2539.3
i4.zdnetstatic.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      15.3     886.4
i4.spstatic.com                    1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       4.7     520.2
i2.zdnetstatic.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       7.8    2920.9
i2.trstatic.com                    9   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       1.5     794.5

Top 10 neighbor report          reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
www.viddler.com                    4   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.turktrust.com.tr              16   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.trendmicro.com                 5   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.reddit.com                     2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.linkedin.com                   2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.google-analytics.com           2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.facebook.com                   2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.dynamicdrive.com               1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.benq.com.au                    1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
wd-edge.sharethis.com              1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0

Local code                      reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
TCP_CLIENT_REFRESH_MISS         2160   6.6% 100.0%   0.0%        7.2   0.4%   0.0%       3.4      12.9
TCP_HIT                          256   0.8% 100.0%  83.2%       14.0   0.8% 100.0%      56.0    1289.3
TCP_IMS_HIT                      467   1.4% 100.0% 100.0%       16.9   0.9% 100.0%      37.2    1747.4
TCP_MEM_HIT                      426   1.3% 100.0% 100.0%       96.5   5.3% 100.0%     232.0    3680.9
TCP_MISS                       27745  84.2%  97.4%   0.0%     1561.7  85.7%   0.3%      59.2      18.2
TCP_REFRESH_FAIL                  16   0.0% 100.0%   0.0%        0.2   0.0%   0.0%      10.7       0.1
TCP_REFRESH_MODIFIED             477   1.4%  99.8%   0.0%       35.0   1.9%   0.0%      75.3    1399.4
TCP_REFRESH_UNMODIFIED          1413   4.3% 100.0%   0.0%       91.0   5.0%   0.0%      66.0     183.5

Status code                     reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
000                              620   1.9% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
200                            29409  89.2% 100.0%   2.9%     1709.7  93.8%   7.7%      59.5     137.1
204                              407   1.2% 100.0%   0.0%        0.2   0.0%   0.0%       0.4       1.4
206                              489   1.5% 100.0%   0.0%      112.1   6.1%   0.0%     234.7     193.0
301                               82   0.2% 100.0%   0.0%        0.1   0.0%   0.0%       0.7       1.5
302                              356   1.1% 100.0%   0.0%        0.3   0.0%   0.0%       0.8       2.7
303                                5   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       1.5
304                              862   2.6% 100.0%  31.2%        0.4   0.0%  30.9%       0.4      34.2
400                                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
401                                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
403                               47   0.1%   0.0%      -        0.0   0.0%      -         -         -
404                              273   0.8%   0.0%      -        0.0   0.0%      -         -         -
500                                2   0.0%   0.0%      -        0.0   0.0%      -         -         -
502                               12   0.0%   0.0%      -        0.0   0.0%      -         -         -
503                               50   0.2%   0.0%      -        0.0   0.0%      -         -         -
504                              344   1.0%   0.0%      -        0.0   0.0%      -         -         -

Hierarchie code                 reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
DIRECT                         31843  96.6%  97.7%   0.0%     1691.0  92.8%   0.0%      55.7      44.3
NONE                            1117   3.4% 100.0% 100.0%      131.6   7.2% 100.0%     120.7    2488.2

Method report                   reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
CONNECT                         5485  16.6%  99.2%   0.0%      132.8   7.3%   0.0%      25.0       0.3
GET                            23190  70.4%  97.7%   4.9%     1686.3  92.5%   7.8%      76.2     183.2
HEAD                            2130   6.5%  93.7%   0.0%        0.7   0.0%   0.0%       0.3       1.1
POST                            2155   6.5%  99.4%   0.0%        2.9   0.2%   0.0%       1.4       2.0

Object type report              reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
*/*                                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.6       3.2
application/cache-digest         396   1.2% 100.0%  50.0%       33.7   1.8%  50.0%      87.1    3655.1
application/gzip                   1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      61.0      30.8
application/javascript           227   0.7% 100.0%  12.3%        2.2   0.1%   7.7%       9.9      91.9
application/json                 409   1.2% 100.0%   0.0%        1.6   0.1%   0.0%       4.1       6.0
application/ocsp-response        105   0.3% 100.0%   0.0%        0.2   0.0%   0.0%       1.9       2.0
application/octet-stream         353   1.1% 100.0%   6.8%       81.4   4.5%   9.3%     236.1     406.9
application/pdf                    5   0.0% 100.0%   0.0%       13.5   0.7%   0.0%    2763.3      75.9
application/pkix-crl              96   0.3% 100.0%  13.5%        1.0   0.1%   1.7%      10.6       7.0
application/vnd.google.sa       1146   3.5% 100.0%   0.0%        1.3   0.1%   0.0%       1.1       2.4
application/vnd.google.sa       4733  14.4% 100.0%   0.0%       18.8   1.0%   0.0%       4.1      13.4
application/x-bzip2               19   0.1% 100.0%   0.0%       78.5   4.3%   0.0%    4232.9     225.5
application/x-gzip               316   1.0% 100.0%  59.8%      133.4   7.3%  59.3%     432.4    3398.1
application/x-javascript        1036   3.1% 100.0%   5.8%        9.8   0.5%   3.4%       9.7      52.1
application/xml                   46   0.1% 100.0%  34.8%        0.2   0.0%  35.1%       3.5     219.7
application/x-msdos-progr        187   0.6% 100.0%   0.0%       24.4   1.3%   0.0%     133.7     149.6
application/x-pkcs7-crl           83   0.3% 100.0%   7.2%        1.6   0.1%   0.4%      19.8      10.8
application/x-redhat-pack         13   0.0% 100.0%   0.0%       57.6   3.2%   0.0%    4540.7     156.7
application/x-rpm                507   1.5% 100.0%   6.3%      545.7  29.9%   1.5%    1102.2     842.8
application/x-sdlc                 1   0.0% 100.0%   0.0%        0.9   0.0%   0.0%     888.3     135.9
application/x-shockwave-f        109   0.3% 100.0%  11.9%        5.4   0.3%  44.5%      50.6     524.1
application/x-tar                  9   0.0% 100.0%   0.0%        1.5   0.1%   0.0%     165.3      36.4
application/x-www-form-ur         11   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       9.9      15.4
application/x-xpinstall            2   0.0% 100.0%   0.0%        2.5   0.1%   0.0%    1300.6     174.7
application/zip                 1802   5.5% 100.0%   0.0%      104.0   5.7%   0.0%      59.1       2.5
Archive                           89   0.3% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
audio/mpeg                         2   0.0% 100.0%   0.0%        5.8   0.3%   0.0%    2958.2      49.3
binary/octet-stream                2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       5.5      14.7
font/ttf                           2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      15.5      12.5
font/woff                          1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      42.5    3539.6
Graphics                         126   0.4% 100.0%   0.0%        0.1   0.0%   0.0%       0.6       2.5
HTML                              14   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.1       0.1
image/bmp                          1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.3       3.9
image/gif                       5095  15.5% 100.0%   2.4%       35.9   2.0%   0.7%       7.2       9.5
image/jpeg                      1984   6.0% 100.0%   4.3%       52.4   2.9%   0.6%      27.0      62.9
image/png                       1684   5.1% 100.0%  10.3%       28.6   1.6%   1.9%      17.4     122.2
image/vnd.microsoft.icon          10   0.0% 100.0%  30.0%        0.0   0.0%  12.8%       1.0       3.3
image/x-icon                      72   0.2% 100.0%  16.7%        0.2   0.0%   6.0%       3.2      15.0
multipart/bag                      6   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      25.2      32.9
multipart/byteranges              93   0.3% 100.0%   0.0%       16.5   0.9%   0.0%     182.0     178.4
text/cache-manifest                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       3.1
text/css                         470   1.4% 100.0%   7.9%        3.4   0.2%   5.8%       7.4      59.7
text/html                       2308   7.0%  70.7%   0.4%        9.6   0.5%   0.6%       6.0      14.7
text/javascript                 1243   3.8% 100.0%   2.7%       11.1   0.6%   5.2%       9.1      43.3
text/json                          1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.5       0.7
text/plain                      1445   4.4%  99.4%   1.5%       68.8   3.8%   5.5%      49.0      41.9
text/x-cross-domain-polic         24   0.1% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       1.7
text/x-js                          2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      10.1       6.4
text/x-json                        9   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       3.0       8.5
text/xml                         309   0.9% 100.0%  12.9%       12.9   0.7%  87.5%      42.8     672.3
unknown/unknown                 6230  18.9%  99.3%   0.0%      132.9   7.3%   0.0%      22.0       0.4
video/mp4                          5   0.0% 100.0%   0.0%        3.2   0.2%   0.0%     660.8      62.7
video/x-flv                      117   0.4% 100.0%   0.0%      321.6  17.6%   0.0%    2814.9     308.3
video/x-ms-asf                     2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.1       4.7

Ident (User) Report             reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
-                              32960 100.0%  97.8%   3.5%     1822.6 100.0%   7.2%      57.9     129.0

Weekly report                   reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
2012/01/26                     14963  45.4%  97.6%   3.6%      959.8  52.7%   1.8%      67.3     104.5
2012/02/02                     17997  54.6%  98.0%   3.4%      862.8  47.3%  13.2%      50.1     149.4

Total report                    reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
All requests                   32960 100.0%  97.8%   3.5%     1822.6 100.0%   7.2%      57.9     129.0

Produced by : Mollie's hacked access-flow 0.5
Running time: 2 seconds

Happy squid reporting!