CloudFormation 2 – importing resources into a CDK stack

In my previous post I mentioned that I was starting to migrate some existing CloudFormation templates to be generated using TypeScript and Amazon’s Cloud Development Kit. I started with some simple static websites that are hosted from S3 buckets, and for these simple sites found it easiest to completely tear down the old infrastructure and recreate it with the CDK. I’m now starting to work on infrastructure that I don’t want to destroy – systems like the EC2 instance that host this blog.

That post also mentioned the possibility to import existing resources into a CloudFormation stack. The CDK doesn’t provide support for this, but it’s still possible. Let’s start with a simple resource that I can recreate if something goes wrong: an IAM user.

Removing the user from the old stack

Let’s start with a simple stack (OldStack) that contains a user (Leigh), and investigate how we can move this to a new stack (NewStack) managed using the CDK. Here’s the old stack:

Resources:
  Leigh:
    Type: AWS::IAM::User
    Properties:
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AdministratorAccess
        - arn:aws:iam::aws:policy/job-function/Billing
      UserName: leigh

We want to drop the user from this stack so that we can use it in the new one. We can do this by adding a DeletionPolicy and then removing it (in two separate steps). First add the policy:

 Resources:
   Leigh:
     Type: AWS::IAM::User
+    DeletionPolicy: Retain
     Properties:
       ManagedPolicyArns:
         - arn:aws:iam::aws:policy/AdministratorAccess
         - arn:aws:iam::aws:policy/job-function/Billing
       UserName: leigh

Deploy the new template so the policy is applied within CloudFormation. We can then remove the resource from the stack and deploy again. For this example we only have one resource, so let’s just delete the whole stack. Here’s the stack events as reported by the AWS console:

The user wasn’t deleted because we added the DeletionPolicy. We now have an orphaned IAM user that’s not attached to any stacks.

Building the new stack

Let’s build a new stack within the CDK that will (eventually) host it. Here’s some code for an empty stack:

import { Stack } from '@aws-cdk/core';
import type { Construct, StackProps } from '@aws-cdk/core';

class NewStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
        super(scope, id, props);
    }
}

export default NewStack;

We can deploy this as usual with the CDK.

Preparing the template for import

Once the new stack has been created we’re ready to import the user. First we need to add code that describes it:

+import { User } from '@aws-cdk/aws-iam';
 import { Stack } from '@aws-cdk/core';
 import type { Construct, StackProps } from '@aws-cdk/core';

 class NewStack extends Stack {
     constructor(scope: Construct, id: string, props?: StackProps) {
         super(scope, id, props);
+
+        new User(this, 'User', {
+            userName: 'leigh',
+            managedPolicies: [
+                { managedPolicyArn: 'arn:aws:iam::aws:policy/AdministratorAccess' },
+                { managedPolicyArn: 'arn:aws:iam::aws:policy/job-function/Billing' },
+            ],
+        });
     }
 }

 export default NewStack;

We can now generate the stack template using cdk synth:

$ cdk synth NewStack
Resources:
  User00B015A1:
    Type: AWS::IAM::User
    Properties:
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AdministratorAccess
        - arn:aws:iam::aws:policy/job-function/Billing
      UserName: leigh
    Metadata:
      aws:cdk:path: NewStack/User/Resource
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Analytics: v2:deflate64:H4sIAAAAAAAAAyWMQQqAIBAA39LdtoQIugX9oOgBsm1gkcKu1kH8e0mnOcwwGoYOdDWaR2rcziahZ4K0BIOnmryTwBGDmnY3k/jISFmV1poL0irERRXmrJzfCA5pbv0te2irQ6ytObpgL4L55wuPOD5FcQAAAA==
    Metadata:
      aws:cdk:path: NewStack/CDKMetadata/Default

Unfortunately this template can’t be used to import the user:

  • “Each resource to import must have a DeletionPolicy attribute in your template” (source) but this is missing.
  • The AWS::CDK::Metadata resource changed when generating the updated template. CloudFormation doesn’t allow you to change existing resources at the same time as an import.

Luckily these problems can easily be resolved manually. We can add a DeletionPolicy and revert the metadata to match the value that’s currently deployed (you can get this from the Template tab within CloudFormation). Save the YAML output from cdk synth or edit the template directly (by default this is written to a cdk.out directory).

 Resources:
   User00B015A1:
     Type: AWS::IAM::User
+    DeletionPolicy: Retain
     Properties:
       ManagedPolicyArns:
         - arn:aws:iam::aws:policy/AdministratorAccess
         - arn:aws:iam::aws:policy/job-function/Billing
       UserName: leigh
     Metadata:
       aws:cdk:path: NewStack/User/Resource
   CDKMetadata:
     Type: AWS::CDK::Metadata
     Properties:
-      Analytics: v2:deflate64:H4sIAAAAAAAAAyWMQQqAIBAA39LdtoQIugX9oOgBsm1gkcKu1kH8e0mnOcwwGoYOdDWaR2rcziahZ4K0BIOnmryTwBGDmnY3k/jISFmV1poL0irERRXmrJzfCA5pbv0te2irQ6ytObpgL4L55wuPOD5FcQAAAA==
+      Analytics: v2:deflate64:H4sIAAAAAAAAAzPUszTRM1R0SCwv1k1OydZPzi9K1asOLklMztZxzs8rLikqTS7RcU7LC0otzi8tSk6t1cnLT0nVyyrWLzME6jTTM1DMKs7M1C0qzSvJzE3VC4LQAMvlLV1YAAAA
     Metadata:
       aws:cdk:path: NewStack/CDKMetadata/Default

Is it safe to deploy with out-of-date metadata? I think so. If we decode the metadata values (these are base64-encoded gzip) then we get the following:

$ echo -n 'H4sIAAAAAAAAAzPUszTRM1R0SCwv1k1OydZPzi9K1asOLklMztZxzs8rLikqTS7RcU7LC0otzi8tSk6t1cnLT0nVyyrWLzME6jTTM1DMKs7M1C0qzSvJzE3VC4LQAMvlLV1YAAAA' | base64 -d | gunzip
1.94.1!@aws-cdk/core.{Stack,Construct,CfnResource},node.js/v14.16.0!jsii-runtime.Runtime
$ echo -n 'H4sIAAAAAAAAAyWMQQqAIBAA39LdtoQIugX9oOgBsm1gkcKu1kH8e0mnOcwwGoYOdDWaR2rcziahZ4K0BIOnmryTwBGDmnY3k/jISFmV1poL0irERRXmrJzfCA5pbv0te2irQ6ytObpgL4L55wuPOD5FcQAAAA==' | base64 -d | gunzip
1.94.1!@aws-cdk/{core.{Stack,Construct,CfnResource},aws-iam.{User,CfnUser}},node.js/v14.16.0!jsii-runtime.Runtime

This doesn’t look very important, especially when the data is stored under a key called “Analytics”. I can’t find any documentation for the AWS::CDK::Metadata resource, so think it’s reasonable to conclude this doesn’t have any meaningful impact on the environment. In any case we’ll fix it later.

Importing the user into the new stack

Select the new stack and choose Import resources into stack to get the process started:

Upload the template that we’ve edited:

Tell CloudFormation how to find the user we want to import:

Preview the changes and import:

Cleaning up

We’ve now successfully imported the user into the new stack, but the template doesn’t match that defined using the CDK because of the manual changes we made:

$ cdk diff NewStack
Stack Root
Resources
[~] AWS::IAM::User User User00B015A1
 └─ [-] DeletionPolicy
     └─ Retain

Deploy the stack in order to revert these changes and bring CloudFormation into alignment with the CDK code.

Posted in Uncategorized | Leave a comment

Chiming at St Paul’s

The COVID pandemic has affected the life of almost everybody on earth in innumerable ways. One of its lesser-known effects has been the near-total suspension of the ringing of church bells in the English style (change ringing). This is certainly trivial in the grand scheme of things, but the cancellation of my favourite hobby has had a large impact on me personally. Gathering twelve ringers in a (generally) stuffy room has been, when not actually illegal, firmly discouraged by both church and state.

St Paul’s Cathedral has an “Ellacombe” installed on six of the bells – a system of ropes and chiming hammers that allows a single ringer to sound the bells. This is quite different from the usual way of ringing with rope and wheel, but has allowed the bells to continue to sound when Sunday services have been taking place. I was chiming for one of these services when Rosie Oliver happened to make a recording. Rosie is an audio producer and London explorer who organises The London Ear guided walk around the City of London.

Her recording follows; all mistakes are mine!

You can read more about this on her blog, including the following reflection:

In many ways chiming feels like any performance. Apprehension is supplanted by concentration: that sort of concentration where time starts to drift and other concerns fade away. Having worked as an organist these sensations certainly felt familiar, but chiming brings its own character. Bells are so much more audible than other instruments and the bells of St Paul’s even more so, but my experience is entirely disconnected from any “audience”. Hidden behind the walls of the ringing chamber I might have an audience of thousands, or of none.

After nervous glances at the clock it’s time to stop. The loneliness intensifies. The walls that divide me from the world used to be a welcome home for our band of ringers but now it is only I. The sound of the bells fades into a vacuum.

I don’t suppose I’ll be invited to become Poet Laureate any time soon!

Posted in Uncategorized | Leave a comment

CloudFormation

I’ve been using AWS CloudFormation for many years to manage infrastructure created on the Amazon Web Services platform. Trying to configure such an environment manually is tedious and error-prone. It’s more efficient and reliable to do so automatically – managing infrastructure as code. It’s actually a few years since I last used CloudFormation professionally, and I was pleased to discover some new functionality while making some updates for a personal project.

CloudFormation stacks can only manage a maximum of 500 resources. Nested stacks are used to avoid reaching this limit: a nested stack can itself contain another 500 resources but only counts as one within its parent. Nested stacks are also the primary form of abstraction within CloudFormation. A stack can be created multiple times with different parameter values. Unfortunately they’re also a little clunky.

  • It’s hard to refactor by moving resources around. Moving a resource from one stack to another will create an entirely new version within AWS. If the resource has a unique name then the update will fail (the new resource is created before the old one is removed, so there’s a collision). If the resource has internal state (e.g. data) then that will be lost when the old resource is deleted.
  • Updates start at the root and ripple down through the tree of stacks. Making a simple change can take a long time as every stack is checked for changes.

It’s now possible to import existing resources into a CloudFormation stack. If somebody happened to create a server manually then you can now add it to your stack and start managing it properly. This also makes it possible to move resources between stacks, solving the first problem above (refactoring).

More excitingly, AWS have released the Cloud Development Kit back in 2019 which makes it easy to create and manage multiple stacks programmatically using a language like TypeScript or Python. This makes it possible to solve the second problem: abstraction in CloudFormation can instead be handled within a general-purpose programming language eliminating the need for complex hierarchies of nested stacks.

I’m now starting to migrate all my CloudFormation templates to CDK TypeScript. The CDK itself outputs a template so I’m still on familiar ground and it’s easy to see where the code isn’t quite lined up with the current configuration. Watch this space for further updates…

Posted in Uncategorized | Leave a comment

Service testing redux

My previous post on Service Testing has become a favourite of mine. I often find myself looking up details of how to check a particular service (and particularly how to use OpenSSL’s s_client). That post is now 7½ years old and I thought it due for a refresh.

HTTP

It’s still possible to type raw HTTP into a terminal and receive a webpage in response from a server. This only works with HTTP/1.0 and HTTP/1.1 (which are plain-text protocols); newer versions are binary protocols that are less scrutable.

The most important difference between the two text protocols is that HTTP/1.1 requires a Host: header. The older protocol only allows a server to host a single domain and so is less popular.

$ telnet example.com 80
Trying 93.184.216.34...
Connected to example.com.
Escape character is '^]'.
$ GET / HTTP/1.1
$ Host: example.com
$
HTTP/1.1 200 OK
Age: 438726
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 13 Feb 2021 18:56:23 GMT
Etag: "3147526947+ident"
Expires: Sat, 20 Feb 2021 18:56:23 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (nyb/1D20)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
...

You can replace the first / in the GET command with any request path, and add additional request headers if desired.

HTTPS

HTTP communication can be encrypted via TLS. Secure HTTP is usually hosted on port 443, and we can use OpenSSL‘s s_client to connect:

$ openssl s_client -connect example.com:443 -crlf -quiet
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert TLS RSA SHA256 2020 CA1
verify return:1
depth=0 C = US, ST = California, L = Los Angeles, O = Internet Corporation for Assigned Names and Numbers, CN = www.example.org
verify return:1
$ GET / HTTP/1.1
$ Host: example.com
$
HTTP/1.1 200 OK
...

SMTP

The venerable Simple Mail Transfer Protocol is nearly 40 years old but (after several extensions) remains the standard way to send email between machines on the Internet.

Plain SMTP

Mail is still commonly sent as plain text. Let’s try it:

$ telnet example.com 25
Trying 1.2.3.4...
Connected to example.com.
Escape character is '^]'.
220 example.com ESMTP Postfix

Say “hello” and check it responds:

$ EHLO example.com
250-example.com
250-PIPELINING
250-SIZE 20480000
250-ETRN
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-DSN
250-SMTPUTF8
250 CHUNKING

Try sending a message to a local user:

$ MAIL FROM: <nobody@example.com>
250 2.1.0 Ok
$ RCPT TO: <somebody@example.com>
250 2.1.5 Ok
$ DATA
354 End data with <CR><LF>.<CR><LF>
$ To: <somebody@example.com>
$ From: <nobody@example.com>
$ Subject: Test message
$
$ Test message content
$ .
$
250 2.0.0 Ok: queued as 49B47827F2
$ QUIT
221 2.0.0 Bye
Connection closed by foreign host.

STARTTLS

Mail can be sent encrypted (using TLS). After connecting to a server, the STARTTLS command triggers negotiation of a secure transport. This isn’t something we can do ourselves, but s_client supports this:

$ openssl s_client -connect example.com:25 -crlf -starttls smtp -quiet
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = example.com
verify return:1
250 CHUNKING
$ EHLO ...

Logging in

In order to relay mail (send it on to another destination) it’s necessary to log in to the remote server. You can log in wherever you see AUTH in the response to the EHLO command. We’ve not seen this yet because it’s common to use a dedicated port (587) for message submission (leaving port 25 for receiving messages destined for the server itself).

$ openssl s_client -connect example:587 -crlf -starttls smtp -quiet
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = example.com
verify return:1
250 CHUNKING
$ EHLO example.com
250-example.com
250-PIPELINING
250-SIZE 20480000
250-ETRN
250-AUTH PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-DSN
250-SMTPUTF8
250 CHUNKING

This server supports login via two mechanisms: PLAIN and LOGIN.

PLAIN

This login mechanism concatenates the username and password together and encodes them.

$ echo -ne '\000username\000password' | base64  # in a shell
AHVzZXJuYW1lAHBhc3N3b3Jk

$ AUTH PLAIN  # in an SMTP session
334
$ AHVzZXJuYW1lAHBhc3N3b3Jk
235 2.7.0 Authentication successful

You’ll obviously need to replace “username” and “password” with real values. After authentication it should be possible to send mail to anywhere on the Internet.

LOGIN

For this mechanism we send the username and password separately.

$ echo -n 'username' | base64
dXNlcm5hbWU=
$ echo -n 'password' | base64
cGFzc3dvcmQ=

$ AUTH LOGIN
334 VXNlcm5hbWU6
dXNlcm5hbWU=
334 UGFzc3dvcmQ6
cGFzc3dvcmQ=
235 2.7.0 Authentication successful

If you base64-decode the server’s responses above you’ll find these are “Username:” and “Password:” respectively.

SMTPS

It used to be common to host SMTP over an entirely encrypted connection on port 465. s_client can connect to this too:

$ openssl s_client -connect example.com:465 -crlf -quiet
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = example.com
verify return:1
250 CHUNKING
$ EHLO ...

IMAP

Plain IMAP (usually on port 143) isn’t as common as IMAPS (port 993). Again we use s_client. If you need to talk unencrypted IMAP then just use telnet.

$ openssl s_client -connect example.com:993 -quiet
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = example.com
verify return:1
* OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE LITERAL+ AUTH=PLAIN AUTH=LOGIN]

Log in:

$ a1 LOGIN "username" "password"
a1 OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY LITERAL+ NOTIFY SPECIAL-USE] Logged in

List some folders:

$ a2 LIST "" "*"
... lots of stuff

Find what’s in the Inbox:

$ a3 EXAMINE INBOX
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft Junk NonJunk $Forwarded)
* OK [PERMANENTFLAGS ()] Read-only mailbox.
* 135 EXISTS
* 0 RECENT
* OK [UNSEEN 6] First unseen.
* OK [UIDVALIDITY 1345668496] UIDs valid
* OK [UIDNEXT 50898] Predicted next UID
* OK [HIGHESTMODSEQ 132256] Highest
a3 OK [READ-ONLY] Examine completed (0.001 + 0.000 secs).

Exit:

$ a4 LOGOUT
* BYE Logging out
a4 OK Logout completed (0.001 + 0.000 secs).

Posted in Uncategorized | Leave a comment

Saving space in large git repositories

In mid-2017 I set up an automated scrape of a frequently-updated website. Every day my script would crawl the website, download its contents, and commit these to GitHub. This allowed me to back up not just the site contents but the complete history of changes.

This scrape did its thing for nearly 4 years until I came to decommission the server where it runs. I was a little surprised to find that the repository had grown to well over 1 GB despite the site’s only containing around 80 MB of data. The root volume of the server was only 8 GB so this scrape was using a pretty big proportion of disk!

I shouldn’t have been surprised: storing the entire history of a website will quickly add up, especially over such a long period. I wanted to keep the entire history of the site, but I realised I didn’t need to store it on the server itself (GitHub does a fine job of hosting repositories, after all). It was time to go digging for a better solution.

git shallow clone

I’d heard about the idea of a “shallow clone,” where one clones only recent commits from a repository rather than the whole thing. git clone supports the --depth option which allowed me to clone only the most recent commit from the repository.

--depth

Create a shallow clone with a history truncated to the specified number of commits.

Git – git-clone Documentation (git-scm.com)

Let’s take a look how this works. First I’m going to create a source repository that has a couple of commits. You can skip this step if you’d prefer to experiment with a real repository.

# Make a bare "remote" repository that does an impression of GitHub

leigh:~$ mkdir remote
leigh:~$ cd remote
leigh:~/remote$ git init --bare
Initialized empty Git repository in /home/leigh/remote/
leigh:~/remote$ cd ..

# Clone the "remote" and add some commits

leigh:~$ git clone file:///home/leigh/remote local
Cloning into 'local'...
warning: You appear to have cloned an empty repository.
leigh:~$ cd local
leigh:~/local$ git commit --allow-empty -m 'First commit'
[master (root-commit) 3c315ce] First commit
leigh:~/local$ git commit --allow-empty -m 'Most recent commit'
[master dc999d5] Most recent commit
leigh:~/local$ git push
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 12 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 251 bytes | 251.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To file:///home/leigh/remote
 * [new branch]      master -> master

# Take a look at the results

leigh:~/local$ git log
commit dc999d56edcb14345da39ea25799879dadc406c7 (HEAD -> master)
Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:26:26 2021 +0000

    Most recent commit

commit 3c315ceef3d3b5da0e02b0ea0249dfd2052175b3
Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:26:14 2021 +0000

    First commit

Now let’s clone this repository again, but only capture the most recent commit:

# Clear out the original copy

leigh:~/local$ cd ..
leigh:~$ rm -rf local

# Clone again, passing --depth

leigh:~$ git clone --depth 1 file:///home/leigh/remote local
Cloning into 'local'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2/2), done.

# Take a look at the results

leigh:~$ cd local
leigh:~/local$ git log
commit dc999d56edcb14345da39ea25799879dadc406c7 (grafted, HEAD -> master, origin/master, origin/HEAD)

Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:26:26 2021 +0000

    Most recent commit

This is useful: we can clone all the files in a repository but ignore all its history.

What next?

I now know how to clone only a single commit, making it much easier to migrate this script to a new server (I have to download only 80 MB rather than > 1 GB).

Unfortunately this doesn’t quite solve the entire problem. In another four years I’ll have accumulated another 1 GB of new commits.

# Add another commit

leigh:~/local$ git commit --allow-empty -m 'New commit'
[master 2555676] New commit
leigh:~/local$ git push
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 187 bytes | 187.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0)
To file:///home/leigh/remote
   dc999d5..2555676  master -> master

# See what we have

leigh:~/local$ git log
commit 2555676490bee5e32b109dc8653596b4bd0de206 (HEAD -> master, origin/master, origin/HEAD)
Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:37:57 2021 +0000

    New commit

commit dc999d56edcb14345da39ea25799879dadc406c7 (grafted)
Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:26:26 2021 +0000

    Most recent commit

New commits are added to the history and stored locally as usual. Usefully I discovered that git fetch also supports the --depth option:

--depth

Limit fetching to the specified number of commits from the tip of each remote branch history. If fetching to a shallow repository created by git clone with --depth=<depth> option (see git-clone[1]), deepen or shorten the history to the specified number of commits. Tags for the deepened commits are not fetched.

Git – git-fetch Documentation (git-scm.com)

Let’s try it!

leigh:~/local$ git fetch --depth 1
remote: Total 0 (delta 0), reused 0 (delta 0)
leigh:~local$ git log
commit 2555676490bee5e32b109dc8653596b4bd0de206 (grafted, HEAD -> master, origin/master, origin/HEAD)
Author: Leigh Simpson <code@simpleigh.com>
Date:   Sat Feb 13 17:37:57 2021 +0000

    New commit

This is perfect: we can create and push a new commit and then throw away previous revisions. GitHub retains the full history of the crawl and I use a lot less disk space on my server.

Putting it together

I ended up with a script that looks a little like this:

#!/usr/bin/env /bin/bash

cd $(dirname "$0")

# Download content
wget --config wgetrc http://example.com/

# Craft a new commit and push
git add .
git commit -m "Update for $(date +%Y-%m-%d)"
git push

# Trim history to the specified number of commits and garbage-collect
git fetch --depth 1
git gc

I use wget to do the scrape, and configure it using a local file (called wgetrc). The git gc call at the end shouldn’t really be necessary but doesn’t hurt.

Other thoughts

Running out of disk is a pretty disastrous situation for a server, and I’m always keen to minimise this risk. The scrape job described above opens up an interesting attack vector: if the site owner were to upload a large file then my script would happily try to download it. In the process it would consume all available disk space and bring my server to a halt!

An easy way to resolve this is to move the scrape onto its own dedicated disk. The server will then carry on running even if that disk fills up. The server is an EC2 instance running on Amazon Web Services so this was trivially easy: I created a new volume, attached it to the instance, and mounted it within the operating system. This is a good pattern for any directory that may grow without bound: even logs can explode in volume during an incident.

If the disk does fill up then I still want to know about it so I can fix the scrape. This is also pretty simple using AWS. I use CloudWatch Logs and the agent process can monitor metrics such as disk space. I monitor all disk volumes within CloudWatch and trigger alerts when disks start to fill up.

I’ll follow if this doesn’t work but hope this won’t be for a few more years.

Posted in Uncategorized | Leave a comment

Mail blacklists and third-party dependencies

Mail blacklists are a sad fact of life. They exist because of spam and work like this:

  1. I send an email from my computer
  2. my computer connects to the server that hosts my email
  3. my server connects to the server that hosts the recipient’s mail
  4. the recipient server looks up my server’s details in some blacklists
  5. if my server is found in a blacklist then my mail is rejected

Disreputable servers that send a lot of spam end up on blacklists and find that they can’t send mail any more. Problem solved!

What about false positives?

A false positive occurs when a blacklist contains a server that it shouldn’t: a server is labelled as a bad actor when it’s actually polite and friendly. I’ve been lucky so far: I only send a small amount of personal email from my server so haven’t run into trouble.

I have seen this problem under different circumstances:

  • I used to send mail from a shared server used by many other people. This worked well for several years until the server started to be blacklisted every few days (almost certainly because another person using the server was sending spam). I eventually had to change my hosting arrangements to resolve the problem. This is a false positive because I wasn’t sending spam – a shared server was blacklisted because of one bad actor.
  • My server used to use a blacklist (NJABL) that was shut down. When that occurred my server started rejecting all mail sent to it: every message was marked as spam. I reconfigured the server to stop using the defunct blacklist.
  • By way of counterexample: an email account on a server owned by my then employer was compromised and used to send spam. The system worked as it should in this case and the server was blacklisted (taking down our corporate email with it). This was a true positive, but after resetting the password on the compromised account it still took a few days to get the server removed from all the blacklists that listed it.

The second issue described above happened last week to a mail provider used by many of my university friends.

What happened?

I tried to send an email to a contact and received the following bounce message:

Reporting-MTA: dns; aws.simpleigh.com
X-Postfix-Queue-ID: <redacted>
X-Postfix-Sender: rfc822; <redacted>@simpleigh.com
Arrival-Date: Sun, 31 Jan 2021 15:35:49 +0000 (UTC)

Final-Recipient: rfc822; <redacted>@cantab.net
Original-Recipient: rfc822;<redacted>@cantab.net
Action: failed
Status: 5.7.1
Remote-MTA: dns; mta02.prd.rdg.aluminati.org
Diagnostic-Code: smtp; 554 5.7.1 Service unavailable; Client host
    [46.137.167.228] blocked using bl.spamcop.net

My server (at the time of writing this is aws.simpleigh.com, with the address 46.137.167.228) is trying to send a message to my friend at cantab.net. Their server (mta02.prd.rdg.aluminati.org) rejected the message because it thinks my server is listed on a blacklist (hosted at bl.spamcop.net).

This was a serious worry: if my server is on a blacklist then I can’t send mail to anyone! I navigated to spamcop.net to see what was going on and found this:

Oops. Their domain name expired and the blacklist went offline. According to the domain’s WHOIS record this happened at 05:00 on 30th January at which point their site was replaced with their domain registrar’s holding page.

Why does an expiring domain break email?

Domain names expire all the time. Microsoft famously allowed hotmail.co.uk to expire in 2003 and Foursquare went completely offline in 2010 for the same reason. Why is this so catastrophic for email? To understand this we need to know more how blacklists work.

Mail blacklists are also known as Domain-Name System-based Blackhole Lists (DNSBL) because they use DNS – the system that we use every day to find sites on the web. DNS converts friendly domain names like google.com into server IP addresses like 216.58.204.46 (which happens to be the server providing google.com for me today).

Mail servers check blacklists by converting server IP addresses into domain names. Here’s what happened while I was trying to email my friend:

  • My server (aws.simpleigh.com, 46.137.167.228) connected to mta02.prd.rdg.aluminati.org to send my message
  • The recipient server reverses the sender address and appends the blacklist domain, in this case obtaining 228.167.137.46.bl.spamcop.net.
  • The recipient server made a DNS request for the above address and received a response: 91.195.240.87.
  • The positive response indicated my server was in the blacklist and my message was rejected.

What is this magic 91.195.240.87? It turns out this is the registrar’s domain holding page! When the domain name expired the registrar started to return that page for any subdomains of spamcop.net. This meant that every server on the internet was temporarily included in the blacklist!

How big was this issue?

Mail servers using the SpamCop blacklist rejected all messages during the outage. This lasted from the expiration of the spamcop.net domain name until its renewal later that day (potentially slightly longer because DNS responses are commonly cached). If their domain expired at 05:00 and was renewed at around 18:00 then the incident might have lasted for 13 hours.

It’s not really possible to determine how many servers are configured to use the SpamCop blacklist. Large mail providers such as Google and Microsoft maintain their own blacklists but many smaller providers will be using public lists provided for free. SpamCop is frequently recommended so is likely to be popular.

What can we do about it?

The uptime of a service can not exceed that of its critical dependencies.

If the above estimate of 13 hours is accurate then SpamCop’s uptime dropped to 98% for the month of February. The cantab.net email service’s uptime cannot exceed this limit: their service was down for the entirety of the SpamCop outage and may have been down at other times for their own maintenance work.

How can we manage this risk?

Contracts

Companies manage risks posed by their suppliers by agreeing formal contracts with documented terms. If cantab.net lost money because a supplier was negligent then perhaps they can recover their losses. In practice this wouldn’t have helped: SpamCop doesn’t work like this.

NO WARRANTY OR LIABILITY: BY USING THE SCBL, OR ANY INFORMATION CONTAINED ON THE SPAMCOP WEBSITE, YOU ACKNOWLEDGE AND AGREE THAT THE SCBL IS PROVIDED “AS IS”, SPAMCOP DOES NOT GUARANTEE THE EFFECTIVENESS OR RESULTS OF THE SCBL OR ANY OTHER SERVICE OR PRODUCT PROVIDED BY SPAMCOP, AND ANY AND ALL WARRANTIES, IMPLIED OR OTHERWISE, ARE EXPRESSLY EXCLUDED. IN NO EVENT SHALL SPAMCOP, OR ITS PARENT, SUBSIDIARIES OR LICENSORS, BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND ARISING OUT OF OR IN CONNECTION WITH YOUR USE OF THE SCBL OR THE SPAMCOP WEBSITE, HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY.

SpamCop.net – SpamCop FAQ: What is the SpamCop Blocking List (SCBL)?

If the risk is significant enough then no doubt commercial blacklists will exist that do provide such contractual guarantees.

Monitoring and alerting

If a server is rejecting 100% of the email it receives then there’s almost certainly something wrong! The rate at which mail is being rejected is an important metric that should be measured. Dispatching an alert to an administrator could allow any problem to be resolved quickly.

Regardless of the technical aspect, it’s prudent for businesses to measure the performance of their suppliers (even free ones)!

Eliminate

If a blacklist is unreliable then we should stop using it. Unfortunately it’s difficult to determine the reliability (or otherwise) of a particular blacklist. MxToolBox lists 94 separate blacklists. How can we know which of these are safe to use? In practice administrators rely on recommendations from others (as described above). SpamCop isn’t a tiny independent company or even a free service but is owned by Cisco Systems: a multinational tech giant. As we’ve observed already this doesn’t guarantee the ability to renew domain registrations on time!

Companies can eliminate the risks posed by third-party providers by bringing work back in-house. Large email providers do exactly this, but the cost of establishing an independent email blacklist is likely to be significant.

Perhaps we should stop using all blacklists? Sadly this isn’t a great option: blacklists are an important weapon in the fight against spam.

Accept

The cost of SpamCop (free) is greatly outweighed by its benefit. Despite last week’s incident the risk of failure remains low. Alternatives are expensive or present the same risks.

Until last week I’d not heard of SpamCop, but based on their reputation I will soon be configuring my email server to use their blacklist. At least they’re unlikely to let their domain name expire again.

Posted in Uncategorized | Leave a comment

Why I’m leaving WhatsApp

TLDR: privacy. I’m on Signal now.

I’ve had a love-hate relationship with Facebook for many years. I still have an account but rarely log in. It sometimes sends me emails and occasionally I open up a private browser window to take a look. I don’t want them following me around the Internet (their tracking tags are pervasive).

WhatsApp was acquired by Facebook in 2014 but any personal data shared with the service could be kept separately. Now that’s changing.

WhatsApp has long prided itself on its commitment to security and privacy, with encrypted conversations and other important technologies integrated into the app.

But the new announcement has sparked fears of the exact opposite: that people’s information is not being kept secret but instead shared with Facebook.

WhatsApp new privacy terms: What do new rules really mean for you? | The Independent

I’m lucky to live in the UK which (despite Brexit) still provides strong privacy protections based on the GDPR:

“There are no changes to WhatsApp’s data sharing practices in the European region (including UK) arising from the updated Terms of Service and Privacy Policy,” a spokesperson said.

…and indeed WhatsApp do advertise different terms for European users. Despite this I’m still worried. I used to trust WhatsApp; now I don’t. Some sources also suggest that UK data will begin to fall under WhatsApp’s US jurisdiction in the future (despite GDPR being part of UK law):

I don’t know how true that is, but still think it’s time to move. From now on I’ll be on Signal, and am pleased that many others are making the same choice:

Metadata

I should make it clear that the content of WhatsApp messages remains confidential (and encrypted). The following data are still up for grabs:

  • [Other peoples’] phone numbers in your mobile address book, including those of both the users of our Services and your other contacts. You confirm you are authorized to provide us such numbers. [emphasis mine]
  • a favorites lists of your contacts
  • groups and broadcast lists
  • how you use our services, how you interact with others using our Services, and the like
  • whether you are online, when you last used our Services

These data are still important! If you don’t believe this then take a look at Why Metadata Matters from the EFF or the linked article ‘We Kill People Based on Metadata’.

Metadata — data about your data — is almost as powerful as the actual data.

Cyjax CISO Ian Thornton-Trump via WhatsApp Beaten By Apple’s New iMessage Privacy Update (forbes.com)

Why Signal?

Signal is an independent nonprofit. We’re not tied to any major tech companies, and we can never be acquired by one either.

Signal >> Home

This was exposed by the new privacy labels that Apple have added to their app store. Here’s a comparison, with Signal on the left and WhatsApp on the right:

Here’s a more interesting reason, which luckily doesn’t apply to me!

Do I really expect everybody to follow me?

I’m not naïve enough to think that all my contacts will migrate to Signal. I know I’m taking the risk of missing out on news or other communications.

Migrations have to start somewhere, and if I can influence just one person to follow me then that’s still a victory.

More coverage

Shortly after Facebook acquired WhatsApp for $19 billion in 2014, its developers built state-of-the-art end-to-end encryption into the messaging app.

In 2016, WhatsApp gave users a one-time ability to opt out of having account data turned over to Facebook. Now, an updated privacy policy is changing that. Come next month, users will no longer have that choice.

Under the new terms, Facebook reserves the right to share collected data with its family of companies.

WhatsApp gives users an ultimatum: Share data with Facebook or stop using the app | Ars Technica

Posted in Uncategorized | Leave a comment

New year, new blog

I know it doesn’t look very new, but this blog is now running on brand new hosting with brand new PHP. Hurrah!

I might even update it occasionally…

Posted in Uncategorized | Leave a comment

Some handy Eclipse plugins (2)

This is an updated version of http://blog.simpleigh.com/2013/06/some-handy-eclipse-plugins/ for Eclipse Mars.

One of the most exciting features of Eclipse is the plugin ecosystem. Eclipse has been around for ages, and there are plenty of fantastic plugins adding additional programming languages and features. Here’s a list of some I’ve found useful:

Geppetto

Puppet is a great way to automate the configuration of new servers. Geppetto (the puppet maker) is an IDE for this: providing syntax highlighting and integration with the Puppet forge.

Json Editor

Support for JSON.

Mars Updates

Eclipse provides much of its core functionality via plugins, and there’s plenty available to extend the default configuration.

  • Update site: http://download.eclipse.org/releases/mars/
  • Documentation: http://www.eclipse.org/
  • Tick:
    • C/C++ Autotools suport
    • C/C++ Development Tools
    • C/C++ Unit Testing Support
    • Dynamic Languages Toolkit – ShellEd IDE
    • Eclipse Web Developer Tools
    • Eclipse XML Editors and Tools
    • Eclipse XSL Developer Tools
    • JavaScript Development Tools
    • PHP Development Tools (PDT)
    • Subversive SVN Team Provider [Subversion integration]
    • Web Page Editor
    • … anything else you fancy!

Markdown Editor

Simple syntax highlighting and document outline for Markdown.

PyDev

An IDE for Python.

ReST Editor

As advertised last post, syntax highlighting makes editing ReStructured Text a lot easier…

TeXlipse

Support for LaTeX.

I hope to update this list as I discover new plugins: feel free to add suggestions in the comments.

Posted in Computing | Leave a comment

Myjson Crawler

What is myjson.com

Myjson describes itself as “a simple JSON store for your web or mobile app”. You can save JSON data using an interface on the site itself  (example) or programmatically via an API  (example). Each saved piece of JSON can be accessed via a randomly-generated URL, but unfortunately the random part of the URL is very short: it seems to be three or four characters drawn from a limited alphabet. This means that it’s easy to guess valid URLs, or even to enumerate all of them. OWASP describe this class of problem as an “insecure direct object reference”, and list this at fourth place in their 2013 Top 10 list of security risks. It’s not fair to criticise Myjson for this as they never advertised their system as secure, but I think it is fair to take a look at what people are storing.

Crawler implementation

The most obvious way (to me, at least) to implement a crawler is as follows:

  1. Generate a list of URLs using scripting language du jour
  2. Use xargs and cURL to crawl.

This has a couple of advantages:

  • It’s really simple
  • xargs has a handy multi-threaded mode allowing us to crawl several pages in parallel.

Unfortunately that would be too easy, so I decided to use JavaScript.

How it works

Full implementation available on Github

We’re going to need a function to output results. I decided to output HTML of this form:

<dl>
<dt>aaa</dt><dd>{ "json" : "that came from", "url" : "myjson.com/aaa" }</dd>
<!-- ... -->
</dl>

Here’s a tiny utility function to create this output:

/**
* Adds a row to the list of results with the query and the response
* @param {string} query
* @param {string} response
*/
var outputResult = function (bin, contents) {
    'use strict';

    var binElement = document.createElement('dt'),
    contentsElement = document.createElement('dd');

    binElement.textContent = bin || '';
    contentsElement.textContent = contents || '';

    document.getElementById('results').appendChild(binElement);
    document.getElementById('results').appendChild(contentsElement);
};

We’ll also need a function to crawl the site. XMLHttpRequest is the obvious tool. We write output as each response comes back using an anonymous function which
closes over the current bin name.

/**
* Looks up the contents of a myjson bin and adds them to the list of results
* @param {string} bin
*/
var lookupBin = function (bin) {
    'use strict';

    var xhr = new XMLHttpRequest();

    xhr.open('GET', 'http://api.myjson.com/bins/' + bin);
    xhr.onload = function () {
        if (this.status === 200) {
            outputResult(bin, this.responseText);
        }
    };
    xhr.send();
};

Finally we need to iterate over the possible bin names. Some nested loops would handle this well enough, but it’s probably tidier to encapsulate this functionality. Here’s a function to iterate over an alphabet of characters:

/**
* Iterates over a list of characters
* @param {string} alphabet List to iterate across
* @param {string} prefix String to prepend before calling CALLBACK
* @param {function} callback Callback function, called with current string
*/
var iterateCharacters = function (alphabet, prefix, callback) {
    'use strict';
    var i;
    for (i = 0; i &lt; alphabet.length; i = i + 1) {
        callback(prefix + alphabet[i]);
    }
};

For each character in the alphabet we prepend an existing string and then pass the result on to the defined callback. Iterating over all three-character bin names is then simple. This example calls our output function directly without crawling each bin:

var alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789';

var run = function () {
    'use strict';

    iterateCharacters(alphabet, '', function (string) {
        iterateCharacters(alphabet, string, function (string) {
            iterateCharacters(alphabet, string, outputResult);
        });
    });
};

run();

Finally we need an HTML document to host all this:

<!DOCTYPE html>
<html>
  <head></head>
  <body>
    <dl id="results"></dl>
    <script type="text/javascript" src="script.js"></script>
  </body>
</html>

Browsers try hard to download data as quickly as possible, and our crawl runs in parallel across several threads without extra effort.

Results

Scope

I restricted the crawl to include three-character names beginning with the letters ‘a’ to ‘d’. The above code will crawl all combinations of three-character bin names, and can also be easily extended to crawl four- and five-character names. It’s sensible to reduce the scope, however:

  • Crawling lots of bins takes lots of time.
  • It isn’t nice to load myjson.com heavily.
  • Thoughts of Weev make me nervous.
  • Brief Analysis

    The sample includes 1,637 rows. The top ten JSON strings are as follows:

    String Count
    “{}” 226
    “{“key”:”value”}” 92
    “{“foo”:”bar”}” 42
    “{“hello”:”world”}” 34
    “{“key”:”value”,”key2″:”value2″}” 30
    “{“glossary”:{“title”:”example glossary”,… 29
    “{“key_updated”:”value_updated”}” 26
    “[]” 23
    “{“test”:”test”}” 17
    “{“key”:”updated value”,”key2″:”updated value2″}” 16

    We can therefore estimate that around 14% of bins contain only the empty object. Many of the examples above seem likely to have been created to test the service, and 69% of the extracting strings contain only 50 characters or fewer.

    It will be interesting to run a similar scrape in the future and see if the distribution of data changes:
    how many people are using this service as intended?

    Scope for Evil

    The API supports updating JSON data by sending an HTTP PUT request. It would only take a few minutes to overwrite all data stored by Myjson. Myjson doesn’t advertise a secure service, and they obviously aren’t worried that data is disclosed. They ought to be worried that somebody might trash everything they have stored.

Posted in Computing | Leave a comment

AWS CloudFront via CloudFormation

Amazon Web ServicesCloudFormation is a great way to define stacks of related resources. I don’t tend to find myself making more than one version of each stack, but have still seen some big advantages:

  • I no longer have to configure resources through the AWS management console, saving a heap of time.
  • Stack configuration is now in source control so all changes are logged.
  • I’ve learnt a great deal more about AWS and how its components interact.

Unfortunately I’ve had to pay for these with another heap of time: spent learning how to use CloudFormation and how to deal with it when things don’t quite work. I’ve wasted a lot of time trying to set up CloudFront CDN distributions, and thought I’d write up a couple of the gremlins I found in case this proves useful to anyone (including me).

S3 domain name isn’t a suitable domain name

Pointing a distribution at an S3 bucket is harder than you might think. All you need is the domain name, but CloudFormation won’t give it up easily.

According to the documentation:

DomainName
Returns the DNS name of the specified bucket.
Example: mystack-mybucket-kdwwxmddtr2g.s3.amazonaws.com

Unfortunately that’s not quite what happens. As described on the AWS forum, the domain name is slightly different and CloudFront won’t accept it. Unfortunately you have to build it yourself:

{ "Fn::Join" : [ "", [ { "Ref" : "BucketName" }, ".s3.amazonaws.com" ] ] }

It sometimes doesn’t work

Unfortunately I can’t currently do any better than that. This forum post implies that adding aliases might break things but I’ve managed to define distributions with aliases.

A second or so after the distribution starts being created it decides to delete itself again, with the message “Resource creation cancelled”. One day I’ll try to put together a reduced testcase. Try defining a very small template, and adding in other resources by updating the stack after it’s worked for the first time. This is good general advice for CloudFormation: test parts of a large template in isolation to save time on each iteration.

Posted in Computing | Leave a comment

Downgrading VMWare Tools

I’ve run into an amusing issue with the tools that come packaged with VMware Player. If I removed content from the middle of a file on the host, this change wasn’t replicated on the guest. Here the file would be reduced in size, but by removing bytes from the end rather than the middle, which isn’t entirely helpful.

I’m experimenting with downgrading the version of VMware Tools, and found handy instructions of how to do this here:

  1. Go to http://softwareupdate.vmware.com/cds/vmw-desktop/player/ and navigate to the version you want. The tools installation is bundled in a tar under the packages folder.
  2. Grab the file, extract it to get an executable, and run it.
  3. Go to VMware player and choose Player -> Manage -> Reinstall VMware Tools…
Posted in Computing | Leave a comment

AWS Architecture Diagrams

AWS release amazing architecture diagrams. The best of these must surely be the diagram of the architecture used to host the Obama for America campaign. The diagrams look great, but aren’t accessible to others as there’s no publicly downloadable set of icons… until now. Someone has created their own for download: http://blog.domenech.org/2013/04/aws-diagrams-adobe-illustrator-object-collection-first-release.html.

Posted in Uncategorized | Leave a comment

Eclipse Memory Limits

Update: I’ve had some instability issues since installing this fix, and have now reverted.

As I use Eclipse more and more, I’ve occasionally run into memory and garbage-collector limits when working on large projects or files. This page provides the solution: bump up the limits in eclipse.ini (found in the Eclipse programme directory):

-Xms512m
-Xmx1024m
-XX:PermSize=64m
-XX:MaxPermSize=128m

Apart from this I’ve been pretty happy with Eclipse. I’m stuck on the Juno version at the moment as some plugins don’t seem to play nicely with Kepler, and haven’t found the time to work out how to upgrade cleanly. I’m certainly preferring it to Aptana at the moment, primarily due to the plugin support. The wider userbase also comes with a corresponding improvement in the quantity and quality of online resources and documentation.

Update: I’ve had some instability issues since installing this fix, and have now reverted.

Posted in Computing | Leave a comment

Service Testing

Here’s a collection of services, and ways to go about testing them:

25: SMTP

Simple Mail Transfer Protocol.

Test with telnet:

$ telnet simpleigh.com 25
Trying 46.137.167.228...
Connected to simpleigh.com.
Escape character is '^]'.
220 aws.simpleigh.com ESMTP Postfix (Ubuntu)

Check it responds:

$ EHLO example.com
250-aws.simpleigh.com
250-PIPELINING
250-SIZE 20480000
250-ETRN
250-STARTTLS
250-AUTH PLAIN LOGIN
250-AUTH=PLAIN LOGIN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN

Try sending a message to a local user:

$ MAIL FROM: <nobody@example.com>
250 2.1.0 Ok
$ RCPT TO: <blog@simpleigh.com>
250 2.1.5 Ok
$ DATA
354 End data with <CR><LF>.<CR><LF>
$ Email body goes here
$ 
$ .
$ 
250 2.0.0 Ok: queued as 18E73288

Try logging in:

$ AUTH LOGIN
334 VXNlcm5hbWU6
$ [ base64-encoded username ]
334 UGFzc3dvcmQ6
$ [ base64-encoded password ]
235 2.7.0 Authentication successful

Exit:

$ QUIT
221 2.0.0 Bye
Connection closed by foreign host.

You can test STARTTLS (where a secure channel is negotiated for an existing connection) functionality using OpenSSL‘s s_client:

$ openssl s_client -connect simpleigh.com:25 -crlf -starttls smtp
... loads of stuff
250 DSN
$ EHLO ...

80: HTTP

Hypertext Transfer Protocol.

Test with telnet:

$ telnet simpleigh.com 80
Trying 46.137.167.228...
Connected to simpleigh.com.
Escape character is '^]'.
$ GET / HTTP/1.1
$ Host: simpleigh.com
$ 
HTTP/1.1 301 Moved Permanently
Date: Sun, 11 Aug 2013 19:54:30 GMT
Server: Apache
Location: http://www.simpleigh.com/
Vary: Accept-Encoding
Content-Length: 233
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.simpleigh.com/">here</a>.</p>
</body></html>
Connection closed by foreign host.

110: POP3

Post Office Protocol.

Test with telnet:

$ telnet simpleigh.com 143
Trying 46.137.167.228...
Connected to simpleigh.com.
Escape character is '^]'.
+OK Dovecot ready.

Try logging in:

$ USER [ username ]
+OK
$ PASS [ password ]
+OK Logged in.

List messages:

$ LIST
+OK 20 messages:
... loads of stuff

Exit:

$ QUIT
DONE

143: IMAP

Internet Message Access Protocol.

Test with telnet:

$ telnet simpleigh.com 143
Trying 46.137.167.228...
Connected to simpleigh.com.
Escape character is '^]'.
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN AUTH=LOGIN] Dovecot ready.

Try logging in:

$ a1 LOGIN [ username ] [ password ]
a1 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS] Logged in

List folders:

$ a2 LIST "" "*"
... loads of stuff

Find out what’s in the Inbox:

$ a3 EXAMINE INBOX
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
* OK [PERMANENTFLAGS ()] Read-only mailbox.
* 20 EXISTS
* 1 RECENT
* OK [UNSEEN 10] First unseen.
* OK [UIDVALIDITY 1345668496] UIDs valid
* OK [UIDNEXT 2543] Predicted next UID
* OK [HIGHESTMODSEQ 1] Highest
a3 OK [READ-ONLY] Select completed.

Exit:

$ a5 LOGOUT
* BYE Logging out
a5 OK Logout completed.
closed

443: HTTPS

Just like HTTP, but use s_client:

$ openssl s_client -connect simpleigh.com:443
... loads of stuff

465: SMTPS

Check using s_client:

$ openssl s_client -connect simpleigh.com:465 -crlf
... loads of stuff
220 aws.simpleigh.com ESMTP Postfix (Ubuntu)

993: IMAPS

Check using s_client:

$ openssl s_client -connect simpleigh.com:993
... loads of stuff
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN AUTH=LOGIN] Dovecot ready.

995: POP3S

Check using s_client:

$ openssl s_client -connect simpleigh.com:995
... loads of stuff
+OK Dovecot ready.

Summary

Almost everything can be driven via Telnet. If you need TLS, use s_client.

There’s a really handy OpenSSL command-line summary at http://www.madboa.com/geek/openssl/.

Posted in Computing | Leave a comment

Undocumented Features

I was amused to discover grumbling around the Internet about Google’s weather API, and their decision to remove it. This describes what happened:

Last month, Google announced plans to shutter iGoogle, among a bunch of other services. Many developers and users were (and still are) outraged, but at least they have some time to breathe: iGoogle isn’t going away until November 1, 2013. That means there are still 15 months left to adjust and export your data. Yet some changes are already starting to take effect: the company’s private Weather API, for example, appears to be dead in the water.

There’s some real gems further down:

Why should Google expect support tickets? The big clue is in the word “undocumented.” More from the article:

Web developer Jordan Stephens isn’t even bothering to look for alternatives. In fact, CurrentlyIn has been completely shut down as a result of the Google Weather API outage, according to an error message from the service.

Indeed, navigating to the CurrentlyIn site gets brings us the following:

currentlydown : (

Google has shut down its undocumented weather API (which was used by this site).

currentlyin.net will be down for the forseeable future.

The statement “let’s use this undocumented API from Google to get weather information for free” seems like a great idea, but undocumented features aren’t really features at all. Anything without documentation is an implementation detail, and subject to change without notice. In this case, change happened to include removal.

There’s some great examples at The Old New Thing:

Three examples off the top of my head of the consequences of grovelling into and relying on undocumented structures.

Defragmenting things that can’t be defragmented

In Windows 2000, there are several categories of things that cannot be defragmented. Directories, exclusively-opened files, the MFT, the pagefile… That didn’t stop a certain software company from doing it anyway in their defragmenting software. They went into kernel mode, reverse-engineered NTFS’s data structures, and modified them on the fly. Yee-haw cowboy! And then when the NTFS folks added support for defragmenting the MFT to Windows XP, these programs went in, modified NTFS’s data structures (which changed in the meanwhile), and corrupted your disk.
Of course there was no mention of this illicit behavior in the documentation. So when the background defragmenter corrupted their disks, Microsoft got the blame.

[…]

I hope you understand why I tend to go ballistic when people recommend relying on undocumented behavior. These weren’t hobbyists in their garage seeing what they could do. These were major companies writing commercial software.

Posted in Computing | Leave a comment

Some handy eclipse plugins

Update: this is currently tested for Eclipse Juno, although I hope to update this for Kepler in the near future.

I’ve been making more use of Eclipse recently as an IDE. I’ve been using Aptana (which is based on Eclipse) for a few years now, but I’m spending less time coding in PHP so it now makes sense to graduate onto the bigger tool, especially as not all Eclipse plugins play nicely with Aptana. One of the most exciting features of Eclipse is the plugin ecosystem. Eclipse has been around for ages, and there are plenty of fantastic plugins adding additional programming languages and features. Here’s a list of some I’ve found useful:

AWS Toolkit

Would you like to manage your Amazon Web Services resources directly while writing code? This sounds a little excessive, but is actually quite useful: Amazon’s management console is slow and will only show lists a few items at a time. This plugin makes it easy to find particular details quickly among a huge lists of items, and also comes with syntax highlighting for CloudFormation templates. It’s not entirely supported on the latest version of Eclipse (Juno), but the parts I need all work well.

EclipseFP

Haskell support for Eclipse, including syntax highlighting and direct support for loads of Haskell tools and frameworks directly (cabal, Hoogle, HLint, HTF, Alex, Happy, UUAGC, Snap, Yesod, …). You can run code directly from the IDE, calling out to GHCi.

Geppetto

Puppet is a great way to automate the configuration of new servers. Geppetto (the puppet maker) is an IDE for this: providing syntax highlighting and integration with the Puppet forge.

Json Tools

Support for JSON.

  • Update site:
  • Documentation:
  • Tick:
    • Json Tools

Juno Updates

Eclipse provides much of its core functionality via plugins, and there’s plenty available to extend the default configuration.

  • Update site: http://download.eclipse.org/releases/juno/
  • Documentation: http://www.eclipse.org/
  • Tick:
    • C/C++ Development Tools
    • Eclipse Web Developer Tools
    • Eclipse XML Editors and Tools
    • Eclipse XSL Developer Tools
    • JavaScript Development Tools
    • PHP Development Tools (PDT)
    • Subversive SVN Team Provider [Subversion integration]
    • Web Page Editor
    • … anything else you fancy!

Markdown Editor

Simple syntax highlighting and document outline for Markdown.

PyDev

An IDE for Python.

ReST Editor

As advertised last post, syntax highlighting makes editing ReStructured Text a lot easier…

ShellEd

BASH syntax highlighting.

TeXlipse

Support for LaTeX.

I hope to update this list as I discover new plugins: feel free to add suggestions in the comments.

Posted in Computing | Leave a comment

ReST Syntax Highlighting

I’ve talked before about writing documentation using Sphinx. Sphinx uses ReST (ReStructured Text) which is great, but sometimes a pain to edit without any form of syntax highlighting. Luckily it was pretty easy to track down an Eclipse plugin to do this.

First add a new software source URL like this:

rest1

Then tick the box to install the plugin:

rest2

Hurrah!

Posted in Computing | 1 Response

MongoDB

I’ve recently started working with MongoDB at work: it forms a core part of our tracking and reporting infrastructure, and all events that we track are slotted tidily into a Mongo database. Mongo has been getting some criticism of late, and while most of this has been largely misdirected (not using the right tool for the job doesn’t mean the tool was the problem), this piece was particularly interesting.

MongoDB does not give us much control over where data is placed, so the frequently accessed data (or data that is scanned together) may be spread over a large area. When scanning data only once, there is no way to prevent that data evicting the more frequently accessed data from memory. Once the frequently accessed data is no longer in memory, MongoDB becomes IO bound and lock contention becomes an issue.

My initial introduction to MongoDB was MongoDB in Action. I remember being struck by this quote:

Database tuning, which in most RDBMSs means tinkering with a wide array of parameters controlling memory allocation and the like, has become something of a black art. MongoDB’s design philosophy dictates that memory management is better handled by the operating system than by a DBA or application developer. Thus, data files are mapped to a system’s virtual memory using the mmap() system call. This effectively offloads memory management responsibilities to the OS kernel.

This sounds great. Why bother configuring when the kernel will probably do a better job than you ever could? It turns out this may have been a poor design decision: the kernel does a good job at managing memory, but doesn’t do the best job because it doesn’t know enough about how that memory is used within MongoDB.

It’s always difficult to separate the hype from the reality with any new technology, as this issue shows.

1. Use Mongo as WEB SCALE DOCUMENT STORE OF CHOICE LOL

2. Assume basic engineering principles applied throughout due to HEAVY MARKETING SUGGESTING AWESOMENESS.

3. Spend 6 months fighting plebbery across the spectrum, mostly succeed.

4. NIGHT BEFORE INVESTOR DEMO, TRY UPLOADING SOME DATA WITH “{$ref: ‘#/mongodb/plebtastic'”

5. LOL WTF?!?!? PYMONGO CRASH?? :OOO LOOOL WEBSCALE

6. It’s 4am now. STILL INVESTIGATING

7. DISCOVER PYMONGO DOES NOT CHECK RETURN VALUES IN MULTIPLE PLACES. DISCOVER ORIGINAL AUTHOR SHOULD NOT BE ALLOWED NEAR COMPUTER

8. REALIZE I CAN CRASH 99% OF ALL WEB 3.9 SHIT-TASTIC WEBSCALE MONGO-DEPLOYING SERVICES WITH 16 BYTE POST

9. REALIZE 10GEN ARE TOO WORTHLESSLY CLUELESS TO LICENCE A STATIC ANALYZER THAT WOULD HAVE NOTICED THIS PROBLEM IN 0.0000001 NANOSECONDS?!!?!?@#

Posted in Computing | Leave a comment

AWS Summit

I’ve just been to this year’s AWS Summit in London. I arrived at the event to find a suspiciously large density of suit-wearing managers. After shaking off the initial fear that I might be at the wrong event, I forged on…

The Business Design Centre is an agreeable sort of place for a conference. There’s plenty of space for cows (it did start life as the Royal Agricultural Hall), and there’s probably a joke about conference delegates and sheep around here somewhere. Lots of people complained about the queue to get in (the organisers had the bright idea of printing name tags as people arrived rather than having them ready), but I’d arrived slightly early and didn’t have to wait for long. I soon tracked down some breakfast and loaded up a sandwich. Note to BDC: £1.10 is too expensive for a sausage (even if they were rather tasty). 60p is certainly too much for two slices of slightly-stale white bread.

Initial worries about the dress sense of those attending were unfortunately completely justified, as the first keynote speech turned out to be an extended sales pitch for cloud computing. I didn’t quite understand the point of this – surely lots of people attending had received the invitation, like me, because they were already AWS customers? A late night, early start, and 1½ hours of boredom made the decision to sit next to my manager rather risky, but I did manage to remain awake for the entire thing, mostly by reading Twitter.

The keynote was punctuated by testimonials from current customers. Interest was maintained by the dubious choice of guests – first a chap from News International and then one from an oil company. These interludes turned out to be the most irritating aspect of the day – most speakers took full advantage of the opportunity to pitch their product, and few said much more than “we like AWS ’cause it’s cheap and it scales.” Even the more-technical talks were hobbled by this requirement, with customer talks failing to fit in to the remainder of the content, and rarely adding any information of interest.

Twitter remained great fun, with the Chinese whispers soon getting out of control:

I’m not sure I was helping…

A lecture about Amazon’s information security was more interesting, and it was illuminating to hear of some of what they do:

  • Staff are only granted access to any system for 60 days until their rights must be renewed by their manager.
  • SSH access to production servers requires a change ticket or issue number, and all activity is logged.
  • Any hard disks leaving their facilities must be physically shredded or destroyed before they may do so.

This lecture was better – but still felt like it was for managers (“don’t worry you can trust us with your data”) rather than developers (“look at our security, it’s cool”).

The afternoon promised more interest, with deeper studies of particular AWS products. Some of these talks were great (presentations about DynamoDB and OpsWorks being highlights of the day), and delivered on the promise of a technology conference – with more detailed information (DynamoDB indexing) and a live demo (using OpsWorks to deploy a web stack during the lecture). Other talks weren’t, with an “Advanced Topics” lecture about “Architecting for High Availability” covering little more than what was in the product overview pages for Elastic Load Balancing and Auto Scaling.

Ultimately I was expecting a tech conference which gave some deeper insight into AWS products, and thought that exposure to the AWS team might well provide that. Unfortunately most of the content was pitched at a very low level. I don’t necessarily think this Amazon’s fault: I evidently wasn’t the target audience, but I was a little bored. I can’t complain too much – the food was pretty good for a free conference!

The next day an email arrived inviting me to supply feedback on my experience, and I thought I might as well do so (the offer of a free Kindle didn’t sway my decision at all, *ahem*). Their survey was hosted by a third party site run by a company called “Qualtrics,” but quality was mostly lacking. For a start, radio buttons aren’t meant to do this:

aws_survey_1

Oh well, I could at least supply some feedback at the end:

aws_survey_2

… or not – as the input box was nowhere to be found.

I think it’s fair to say my appreciation of the day was fairly mixed. I got a day off work, and some free stuff (stickers, food and beer). I paid for it though, as I’m sure I’m stupider now.

There was one astonishingly cool feature – the presentations all used really cool little graphics (see http://awsofa.info/ for a great example). It’d be really cool if these were freely available.

Posted in Computing | Leave a comment