Developing in a Linux Virtual Machine on Windows

Sadly for this post, but luckily for the rest of us, VMWare have now released version 5 of the Player software. Everything seems to work as before and most functionality appears in the same place, but the screenshots are now out-of-date.

I like Windows. I know my way around it, I’ve tried the alternatives and I still like it.

The most important part of writing any code is testing it, and it’s hard to test properly on a remote machine. Every change has to be uploaded (over a slow connection) before it can be run. It’s obviously a bad idea to test on a live site, so now we need two remote machines. Now the cost is mounting along with the frustration.

I tend to work in PHP, with sites hosted on Apache and using MySQL databases. As it turns out, these can all be installed on Windows system, and there’s easy packages available to do that (like WAMP or XAMPP). I’ve tried one of these (can’t remember which), and despite a clunky interface everything did seem to work. I soon ran into problems, however, when maintaining and developing PHP command-line scripts build for a UNIX machine, finding that:

  1. Everything’s in the wrong place. The current scripts expected to look in /usr/bin, but this directory didn’t exist. The closest match might be C:\Program Files\, which isn’t the same at all.
  2. Shell scripts written for BASH don’t work on cmd.exe. Nothing I could do was going to make them work.

These problems aside, testing code in one environment and then deploying to another doesn’t seem like a good idea. I needed access to a UNIX box.

The most obvious option one is to install a UNIX system locally and work on that. This sounds fantastic – you edit code and it’s already uploaded! Unfortunately for me this isn’t an option ’cause I like Windows. Luckily I’d heard of virtualisation. It didn’t take long to start up a Linux virtual machine which ran cleanly within Windows, and here’s how you can do it too.

There’s a wide variety of host software for running virtual machines within Windows, and I’d already heard of VirtualBox, VMWare Player and Virtual PC. The latter doesn’t officially support Linux guests, and a brief comparison led me to pick VMWare’s Player. Installing this was very easy, and I soon found myself ready to start creating a virtual machine.

Choosing the option to create the VM starts a wizard to guide you through the settings. To install an operating system you’d commonly need an installation CD, but the player can cope with .iso image files directly, so burning the physical CD isn’t necessary. I grabbed the latest version of Ubuntu server from their download site and was ready to go.

I’m probably over-suspicious, but I tend not to like letting applications help me out with operating system installations, and VMWare Player will jump in and ask you if you want help with the process. The best way to get it out of the way is to avoid telling it what you’re up to. I chose the option to install the operating system later. (I have had experience of things not working right after using the guided installer).

Again we keep our intentions secret, avoiding telling the player that we’re installing Linux.

At the end of the wizard, click the “Customize Hardware” button to tweak the VM’s hardware settings.

Again I’m probably being over-suspicious, but I think operating systems like an easy life and so my first step was to remove any unnecessary hardware (e.g. USB controllers / printers). Once you’ve done that it’s worth bumping up the available memory to a sensible level, and it’s finally time to tell the player what CD image we want to put in the drive (click to zoom in).

I want to set up a webserver and this will expect to be connected to a network in a normal manner. I therefore chose to bridge the host network adapter, allowing the guest operating system to connect directly on to the network.

Once you’ve done this you’re ready to boot up. The CD image will be used directly, and the virtual machine will boot into the Ubuntu setup program. This is fairly easy to work through, and there’s lots of help and support on the Ubuntu website. Towards the end of the process you’ll find an option to automatically install a load of software. My suspicious nature kicked in again and I decided not to bother, opting to install what I wanted manually later.

VMWare Player comes with a suite of software (the VMWare Tools) which we’ll need. Hiding the operating system we’re installing kept the process smooth, but prevented the tools installing automatically. This is easily rectified: first shut down the VM, then run VMWare Player again. You’ll have an option to edit the virtual machine settings.

Choose “Linux” and “Ubuntu 64-bit” (I installed the 64-bit variety of Ubuntu).

Now the player knows what operating system is running it will offer to install the tools.

Before we do this we need to get the operating system ready. Run the following as root:

apt-get update
apt-get upgrade
apt-get dist-upgrade
apt-get autoremove
apt-get install build-essential
shutdown –r now

When the machine reboots you’re ready to install the tools. You may need to change the filenames as version numbers change – use the shell’s tab autocompletion to help with this.

mount /dev/cdrom /media/cdrom
cp /media/cdrom/VMwareTools-8.8.4-743747.tar.gz .
umount /dev/cdrom
tar -xzf VMwareTools-8.8.4-743747.tar.gz
vmware-tools-distrib/vmware-install.pl

You should be able to get by just by selecting all the default options. I’ve had a few problems getting this to work recently, possibly because the latest Ubuntu comes with version 3 of the Linux kernel whereas the tools build “Using kernel 2.6 build system”. Making sure your system is totally up-to-date usually does the trick.

We’ve now got an easily-accessible shell, but nothing else. We’ve not got the file sharing which had been one of our original aims. This is easy to fix now the tools are installed. You can get to the Virtual Machine Settings dialog from the Virtual Machine menu even though the VM is still running, and set up a file share.

Create a shared folder, and remember its name. I’ve got some web code in E:\Documents\Code\www, and I shared this with the VM. Click OK until you’re back at the shell. VMWare calls this file sharing the “Host Guest File System”, and puts all shares in /mnt/hgfs. A simple ls /mnt/hgfs/www showed that everything was working properly and the host’s files were showing through.

Now we’ve got a Linux server running locally, and I’ve shared files from my host system. I can edit any code from within Windows, and the files will appear immediately on the guest, making it dead easy to debug code. There’s just a few more steps to cover to complete the installation, and we’ll take a look at these next time.

Posted in Computing | 2 Responses

Migration – Further Progress

Now my diary is settling in on its new server, I thought I’d add a couple of thoughts:

  • It now runs far faster, feeling much more snappy and responsive. No other users are competing for server resources, and the server’s a lot nearer (Ireland rather than the US).
  • NameCheap have spoilt themselves, and their server has been listed on a spam blacklist at least four times in the past month. Not being able to send email is really irritating.
  • Microsoft are so pleased I’ve signed up for Azure that they’re trying to ‘phone me “to find out about your plans with regards to Windows Azure platform and welcome you as a new user.” Although this is probably an attempt to sell me more, it feels a lot more welcoming than, well, every other online service I’ve ever used.
Posted in Computing | Leave a comment

Migration: Aches and Pains

I recently migrated my diary to a new server, and was amused by some of the problems I faced doing so. Deploying PHP applications should be a simple affair – set up the database, upload the code – but moving to a newer version of the language brought many problems. PHP now bothers to check for errors in a lot of circumstances where it previously didn’t, causing warnings and notifications to pop up all over the place (and filling my logs with nonsense). PHP has always fought hard to try and be the least secure computing platform in existence, but even I was amazed to see how many SQL injection vulnerabilities I’d managed to include in the code when I originally wrote it a few years ago. While it’s pleasing that I’ve learnt so much in a short space of time (I now know what an “SQL injection vulnerability” is, for example), I can’t help feeling a little bit frustrated about the situation. Just a couple of examples:

  • While other languages provide proper abstractions to access databases (e.g. the fantastic SQLAlchemy library for Python), PHP positively encourages you to just mysql_connect() and start firing queries into the database (at least that page now carries a warning).
  • Some languages give you a framework to access data passed in by the user. At least PHP doesn’t just spread them around anymore, but it does make them easily available – so easily that it’s natural to grab hold and start using them, even if that’s the security equivalent of leaving the front door wide open.

Apologies for what’s turned into a rather snarky rant, but I’m in good company. It turns out there’s a whole community of PHP-haters: you can find out more via http://phpsadness.com/, http://www.phpwtf.org/, http://two-pi-r.livejournal.com/622760.html, and PHP Turtles for examples. I’ve recently really enjoyed reading this post:

I can’t even say what’s wrong with PHP, because— okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it’s one of those weird tri-headed things. Okay, well, that’s not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don’t have those serrated surfaces; it’s flat and smooth. That’s less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there’s no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what’s the problem with these tools? They’re all I’ve ever used and they work fine!” And the carpenters show you the houses they’ve built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That’s what’s wrong with PHP.

Web development, especially secure web development, isn’t easy. Even helpful articles are full of errors: this page recommends SHA1 for password hashing (and assumes that the only problem with the algorithm is hash collisions), even though that technique went out with the ark. What’s irritating is that PHP makes it incredibly easy: easy enough to shoot yourself in the foot again and again.

Posted in Computing | Leave a comment

Compression

A link recently bubbled up the frontpage of Hacker News, and I found myself rather curious to find out more. The headline rather proudly proclaimed: “RJSON: compress JSON to JSON”. JSON is a useful and simple way of transmitting data between machines on the Internet, and is commonly-used by many dynamic websites today (see Wiki for more details). Surely a way of compressing these transmissions would speed up the world? Hurrah!

Let’s take a look at the page itself. Basically, it’s a library which lets you get from here:

{
	"id": 7,
	"tags": ["programming", "javascript"],
	"users": [
		{"first": "Homer", "last": "Simpson"},
		{"first": "Hank", "last": "Hill"},
		{"first": "Peter", "last": "Griffin"}
	],
	"books": [
		{"title": "JavaScript", "author": "Flanagan", "year": 2006},
		{"title": "Cascading Style Sheets", "author": "Meyer", "year": 2004}
	]
}

To here:

{
	"id": 7,
	"tags": ["programming", "javascript"],
	"users": [
		{"first": "Homer", "last": "Simpson"},
		[2, "Hank", "Hill", "Peter", "Griffin"]
	],
	"books": [
		{"title": "JavaScript", "author": "Flanagan", "year": 2006},
		[3, "Cascading Style Sheets", "Meyer", 2004]
	]
}

The page also helpfully links to alternative compression schemes called JSON DB and CJSON, which each produce slightly dXMLifferent compressed output.

Let’s consider a number of questions:

  • Does this really compress the file?
  • Is it useful?
  • Should I use it?

Does this really compress the file?

When comparing the two compressed versions I was curious to note that they use a slightly different example for compression. The RJSON site used an abridged version of the example on the JSON DB page. Let’s take a look at these examples and how they compress. For the purposes of the analysis I’ve used UNIX line-endings and not minified the examples in any way. To produce the data I used the online demos for RJSON and CJSON.

  • Example 1 (Short version from RJSON page)
    • Original: 340 bytes
    • RJSON (claimed): 279 bytes
    • RJSON (via demo): er, 549 bytes
    • CJSON (via demo): 308 bytes
  • Example 2 (Longer version from JSON DB page)
    • Original: 711 bytes
    • RJSON: 725 bytes
    • JSON DB: 548 bytes
    • CJSON: 429 bytes

Let’s be charitable and assume that RJSON really works fine, but the demo is broken. If we take the claimed compression then we save an underwhelming (340 – 279) / 340 = 17.9% of the file size. JSON DB manages 22.9%, although I suspect much of this improvement is caused by the larger example input which contains  more repeated strings. These algorithms target these repeated strings (the keys for the JSON objects) and so perform better with a larger input. The compression performance will tend to a limit as the size increases, and the back of an envelope shows that this could be quite a large limit for RJSON if we keep adding more “users” to their example (around 50%). JSON DB and CJSON work similarly, and I’m not going to discuss them further.

I can’t help feeling we’re missing something crucial here – with such large potential gains why is the filesize increasing? Clearly the examples are far too short, but they also have a crucial flaw: the whitespace. Indentation and newlines make it clear what’s going on, but don’t represent the theoretical compression we could have by removing those as well. JSMin is a tool for stripping such spaces, and there’s an online version too to make things easy.

  • Example 1
    • Original: 340 bytes
    • Original minified: 284 bytes
    • RJSON minified: 232 bytes
  • Example 2
    • Original: 711 bytes
    • Original minified: 456 bytes
    • RJSON minified: 326 bytes
  • Example 3, made from the first example by retaining only the user section and repeating it 31 times:
    • Original: 4015 bytes
    • Original minified: 3243 bytes
    • RJSON minified: 1632 bytes

Here’s the start of Example 3:

{
"users": [
	{"first": "Homer", "last": "Simpson"},
	{"first": "Hank", "last": "Hill"},
	{"first": "Peter", "last": "Griffin"},
	{"first": "Homer", "last": "Simpson"},
	{"first": "Hank", "last": "Hill"},
	...

Now we’re getting somewhere, with percentage savings of 18.3%, 28.5% and 49.7%. It’s easy to show that, as we repeat the user section more times, this ratio will tend to a maximum of around 50.5%.

Finally, it’d be useful to compare to another compression algorithm to keep things in context, and gzip is a good example. Minifying and gzipping we get:

  • Example 1: 208 bytes
  • Example 2: 260 bytes
  • Example 3: 127 bytes

RJSON is beaten marginally in the first case, convincingly in the second case, but resoundingly in the third. The astonishing compression for the third example is another measurement error: gzip is really good at compressing repeated strings, and 32 copies of the same data compress really well.

Is it useful?

Compression Rate

Well, we’ve got some compression, but not as good as gzip. Clearly the system will be useful in situations where gzip isn’t available, but isn’t worthwhile otherwise.

Support

To use the algorithm we need the code (which, ironically, needs to be sent to the client before it can be used). The code for gzip is already built in to web browsers, meaning we get that for free when downloading data from websites. There’s still some use, however, because data sent from the browser to the server won’t be compressed.

Alternatives

The first question in my mind on seeing these schemes was: “Why?” The developer of the application has control over the data being transmitted, so here’s a quick way to compress the file: change the field names. Changing “first” to “f” and “last” to “l” for Example 3 above drops the filesize to 2571 bytes, a 20.7% saving for virtually no effort. This clearly isn’t as good, but has one crucial advantage: we’re still transmitting JSON, and don’t need any special code to decompress it. Restructuring the code further could have similar gains to the RJSON output without mangling the data too much.

Should I use it?

JSON is already a pretty slender protocol (when compared, say, to XML), so does it really need much compression? Using short keys and only sending the minimum of data required will save a large amount of transmission with very little effort. I see two big disadvantages with any JSON compression scheme implemented in code:

  • It’s implemented in code. Code is really dangerous – all programmers should try really hard to write less code. Code needs to be maintained, adds to the complexity of a project, and contains bugs. Your transmission is failing: now you’ve got a whole extra chunk of code that could be causing the problem.
  • It’s not as transparent. At least the data is still JSON, but it’s not the clear, elegant JSON which is really easy to read. The first step in investigating the failing transmission would be to take a look at what’s being transmitted, but that is now a trickier proposition.

As far as I’m concerned, anything which introduces complexity at the same time as making your application harder to debug is a lose-lose proposition, especially when the gains are reasonably marginal compared with common-sense optimisations. So is this trade-off ever worth it? The answer, of course, depends on what you’re building.

Posted in Computing | Leave a comment

Virtualisation and web hosting

I’ve been gradually starting to migrate bits of this site over to a VPS (Virtual Private Server) hosted on Amazon’s AWS. I’d originally come across the concept of a VPS at a previous job which had been offered a Slicehost instance for test purposes. While I’d experimented with Ubuntu on the desktop before, and had used a Linux shell routinely on shared hosting machines, I’d never really had a reason to learn what was going on behind the scenes. The combination of root access with a reason to use it provided a substantial motivation to learn more, and the fantastic Slicehost articles soon had me running GeoServer on Tomcat behind an Apache reverse proxy. No matter what it all meant – an exciting new world beckoned, and I wanted my own part of it.

I’d heard of AWS, and investigating more turned up an unexpected bonus. Amazon’s masterplan (which is actually a really good deal) is to offer a “Free Usage Tier“, providing enough for a free VPS for a whole year. I signed up immediately and soon got things going, learning my way around both Amazon’s infrastructure and Ubuntu server.

Although I’ve made great use of the server for learning and testing, https://aws.simpleigh.com/ is still looking pretty spartan, and this is in part due to my reluctance to go too far with AWS. While I could have migrated this whole site (including hosting, databases, DNS settings, and data storage) to AWS, the end of the free year would have resulted in either costly bills or a time-consuming migration. In addition, my current hosts (Namecheap) have always done a pretty good job for a pretty low price. I’ve installed Linux on the desktop again (Xubuntu this time), and use VMWare’s Player to run a local copy of Ubuntu server from within Windows. Did I need to continue with a VPS: could I justify the cost?

A number of factors have pushed me further towards using a VPS. As I’ve gained knowledge of Linux, I’ve increasingly wanted to use servers set up my own way. If I need another binary or plugin then I can install it. If I need Apache reconfiguring, I can do it. Namecheap have moved to a new server and terminated my shell access (moving a big folder over FTP takes ages), and their server has started to return 503 responses occasionally when it’s overloaded. I’ve also recently discovered an annoying problem where removing a subdomain doesn’t quite work (they’ve failed to configure a default virtual host, meaning that random sites get returned under my subdomains until DNS entries expire). I don’t have those problems. I’ve therefore started to migrate more things over, with my diary having made the leap already, despite requiring substantial reworking (goodness knows how I managed to write in that many SQL injection vulnerabilities – a story for another day).

The question now becomes one of price – while I can afford £12.48 a month (at current exchange rates), I’d like to do better. I think some of the better options are:

  • AWS offers “Reserved Instances“. Basically I can pay £64.69 upfront to reduce the monthly cost to £4.59 (averaging out to £6.39 pm). This is a lot better, but at the expense of signing up for three years – if I want to stop the instance then a fraction of the upfront payment will be wasted.
  • Hetzner also offer a VPS for £6.39pm (by coincidence). This is without the 3-year signup, but is slightly inferior in terms of RAM (and presumably processing power and network connectivity too).

There’s recently been quite a bit of buzz around Windows Azure, which is now offering Linux VPSes. It also comes with a 3-month free trial, so seemed worth having a go. Perhaps most astonishingly, it looks like the price is currently the best – if I’m reading things right an “XS” non-Windows costs only £5.68 per month! (Even if I’m not reading things right there’s a cushion of nearly £1 left for extra costs like data storage).

So I now have two VPSes (http://azure.simpleigh.com/ is also pretty spartan), with still not much to do with them… I think the current plan is to get to know Azure’s infrastructure a bit better (the management console is incredibly slick) and then see which bill is the biggest.

Posted in Computing | 3 Responses

Dvorak – settled in

Current wpm98.

I’ve been enjoying trying to get my particular flavour of Dvorak working on Linux, and will post more details of how it’s going in the near future…

Posted in Computing | Leave a comment

Dvorak Day n

Current wpm: 56 (up from 20).

Starting to get to grips with things now. I wrote out all the letters on sticky-labels and affixed them to the keyboard, which really did make an enormous difference. I no longer had to feel around for a key, or look at a separate diagram, but simply needed to look down at the keys themselves.

Posted in Computing | Leave a comment

Dvorak Day 4

Current wpm: 20 (stable from 20).

A weekend away from the computer has prevented any further speed improvements. Maybe tomorrow will see a better result. I’m starting to remember where all the keys are, but am not at all fluent and have to make heavy use of the “backspace” key.

On a side note, in these days of hackers and malware, Dvorak is noticeably less secure. This is because I can no longer be bothered to type in passwords, and therefore never log out of anything.

Posted in Computing | Leave a comment

Dvorak Day 3

Current wpm: 20 (up from 12).

Did some serious practice today. Also made another keyboard layout so that I don’t have to re-learn all the punctuation. Will this process continue until all the keys are restored?

Using the computer is very frustrating. Typing long passages is tiring, and it’s really annoying when doing something as simple as renaming a file to have to stop and think about every keypress. Most infuriating is that something which used to be second nature is now really hard.

I probably won’t be able to get to the computer tomorrow, so will take a well-earned rest from typing.

Posted in Computing | Leave a comment

Dvorak Day 2

Current wpm: 12 (down from 100 or so).

Using the MS Keyboard Layout Creator, I’ve put together a UK Dvorak Variant. Am considering using speech recognition instead of typing, which is now tedious.

Posted in Computing | Leave a comment

The Dvorak Simplified Keyboard

I’m currently typing this very slowly having changed my keyboard layout to the Dvorak Simplified Keyboard. It’s proving quite slow-going, but will hopefully speed up my typing and reduce the strain on my wrists. Next report to follow.

Posted in Computing | Leave a comment

It’s OK, the software knows best

I find the attitude of some software developers ridiculously arrogant. They all think that they know best. They know the best algorithms, they write in the best language, and above all, they make the best decisions.

Sometimes, however, I’m not so sure.

Today I decided to clear out the small toolbar which acts as a menu bar in Internet Explorer 7. This shares the same space as the tabs for each page I have open, so I like it kept small and tidy so it doesn’t infringe on my browsing. I don’t need the “Home Page” button as I can Google-search via the address bar. I certainly don’t need a Skype button; if I want to make a call I’ll start the application myself.

Next time I start IE, what do I find? Yes, that’s right, the Skype add-on has restored its toolbar button, despite me having just removed it seconds ago. It’s OK, the developers know best. They know that I couldn’t possibly do without their piece of software.

Well, tough luck, I know best. I’ve disabled all your browser plug-ins, rendering your button-replacing system impotent. Ha!

Posted in Computing | Leave a comment

Cooking

  • Action: "A drop of alcohol will spice up this bolognese sauce."
    Reaction: It now smells like an accident in a mulled wine factory.
  • Action: Trying to cook bolognese at all.
    Reaction: My shirt is covered in red spots.

Equal and opposite? Not so sure.

Posted in Food | Leave a comment

More lunch blogging

Today sees a serious sandwich combination, namely ham and mustard. I recently provided a buffet supper for some friends, and included this classic combination a long with some others, all unashamedly stolen from the Hotel Felix afternoon tea menu:

  • Ham & Mustard
  • Cream Cheese & Cucumber
  • Smoked Salmon
  • Cheese & Pickle

(admittedly their combinations were a little more upmarket, stating the source and variety of each item)

Posted in Food | Leave a comment

In Defence of Spam

Nobody could really be surprised when spam emails started appearing. Every other communication medium is filled with advertisements and unsolicited intrusion. Newspapers are filled with adverts, and we are harassed by junk mail, telemarketing and door-to-door salesmen.

Two experiences have give me something more to think about when it comes to spam.

Some time ago I was reading the traffic of an email list of which I am a member. For some reason somebody posted to the list a spam message that they had received. Somebody else said “hang on, I’ve received that as well.” All of a sudden everybody was looking through their spam folders, and lots had received the same message. The list was in uproar – how had the spammer got their details?

Had their email addresses been grabbed from the list server, probably hacked by a malicious Russian teenager and sold for thousands? Their email addresses had in fact been collected by hand by another member of the list, whose friend was starting a company and was trying to drum up some publicity. Many on the list responded angrily. It seemed like they’d collected together all their hatred of email spam and directed it at this one person. This seemed a little harsh to me, especially when it became apparent he’d received a threatening phone call, which takes a petty argument to a whole new level.

The second event was more recent. A group with whom I am involved were organising an event. They’d collected lots of names and addresses, and also two-hundred or so email addresses which they wanted to contact about the event. I declined. Should I have?

Now we all know about evil spam. I don’t want tablets or surgery in an attempt to improve my sexual prowess. Nor do I want illegal drugs or to help Nigerians transfer money around. Over here we have banks for that. That sort of spam is a complete waste of packets, and probably clogs up the tubes.

But what about the other sort of Spam?

Supposing I were launching an event, or starting up a business. I’d probably print some fliers and put them through a lot of doors. I’d probably send some emails to people I knew, and to people they new, and maybe to people I didn’t know at all. Surely if my business is providing a useful service then maybe it’s in their interest to know about it.

In turn, I sometimes quite like getting junk through the door. It’s nice to know that some kids down the street will mow my lawn if I get bored of doing it myself, and I’m grateful for some of the takeaway menus. Email is a far better medium for this communication. It takes about the same to decide what’s important and what isn’t, but there’s no impact on the environment or effort required to dispose of the item.

So I say bring on the spam. Just make sure it’s interesting, relevant, and that I might want to use what you’re selling.

Posted in Computing | Leave a comment

Lunch blogging

Ham and sun-dried tomato omelette, salad with thousand-island dressing.

Maybe I ought to do more of this.

Posted in Food | Leave a comment

Leigh’s Backup Strategy Begins…

I’ve never really had a backup strategy. Now and again everything’s turned to toast, but previously it’s not been much of a worry. As one gets older, however, things start to seem more important, and constant nagging focuses the mind as well.

Each loss of data eats away at your soul a little bit. Admittedly I haven’t had a big crash for a while, but little losses happen all the time. I seem quite adept at overwriting newer versions of files when synchronising online and offline copies of this site, for example. Every time this sort of thing occurs I find myself reminded of the fragility of digital data.

Secondly, things become older, and the function of more and more hours of expenditure. After a while the thought of losing a big project becomes more acute.

Some initial worries of data loss got me thinking about online storage. Collecting your email via IMAP is an easy way to start the process. Now the death of a computer doesn’t mean you’ve lost all your previous emails. Since most of us tend to accumulate emails containing useful information, this is a definite plus.

As part of this online storage drive, I moved my diary online. I also uploaded my budget and some other bits and bobs. Of course one does have to make the assumption that the hosting company are better at looking after data than I am. I don’t think this is a bad assumption – after all, that is what I’m paying them for. Not only this, but a few bad data losses would almost certainly destroy their business. These people have a backup strategy, unlike me.

The last time I lost data badly was when I was still at school. I think it could have been related to the widely-publicised “Deathstar” drive errors, but all I can really remember is that the hard drive was toast. So the first part of the strategy was to buy another one.

So, we’ve now got backup potential and a speed bonus by putting programs and data on separate hard drives. Maybe it could be time for a dual-boot Linux installation as well…

I’ll let you know how I get on.

Posted in Computing | Leave a comment

What’s in your phone number?

Recently I’ve been entering a lot of phone numbers in a database. This is dull. Phone numbers do, however, open up a load of (slightly) interesting geekery.

One of the problems I had was with the London dialling code. Numbers in London tend to start with 0207 or 0208. I decided that this seemed like a reasonable dialling code. Some people tended to list their phone number as (020) 8…… but I discounted that – after all it looks very European and not at all like other British numbers.

Unfortunately my slightly Euro-phobic decision turned out to be completely wrong. I was labouring under a common misapprehension. Luckily I’m not alone, and it isn’t really my fault. After all, even BT seem not to understand, listing 0207, 0208, and 0203 as separate codes.

From quite an amusing rant:

Worse still, on some telephone networks the call return (1471) service reads out numbers in these wrong forms. For any company to have let this bug slip through in the first place is bad enough, but I can’t for the life of me understand how any of them can have not fixed it by now. You’d think that basic knowledge of how phone numbers work is essential to the ability of a phone service to function. In any case, anybody who cannot understand such a simple aspect of a phone numbering system has absolutely no business to be working in a phone company. But perhaps worst of all is that some websites devoted to providing information on dialling codes, such as UK Phone Info, are giving false information in this department.

Luckily reading further down the page gives some clear instructions. UK area codes are either:

  • (02x) [London]
  • (011x)
  • (01×1)
  • (01xxx)
  • (01xxxx) [very rare]

If you want to know what all the area codes are, then you can, as always, consult Wikipedia. Their list is particularly useful in that it reveals the origins of many area codes as mnemonic versions of their place name. For example, Cambridge starts with the letters “CA”. If you wanted to text that on a mobile phone you’d use the number 2 key twice. So the Cambridge area code starts 0122x. In this case it’s the third area code of this form, some others being Cardiff, Aberdeen, Bath and Carlisle.

I think it’s very satisfying learning about the origins of complex systems such as the UK area codes and their history. And now you can too.

Posted in Uncategorized | Leave a comment

Link of the day

MSDN:

All print providers must export the initialization function, InitializePrintProvidor. Pointers to all the other functions must be supplied in a PRINTPROVIDOR structure. (Note that these two names are misspelled, but are consistent with the names that appear in the header file, winsplp.h.)

Of course, one can’t correct the spelling of one’s functions, else one breaks the principal of backwards-compatibility. (via Raymond)

Posted in Uncategorized | Leave a comment

Open Source Software – a Chink in the Armour

The collective wisdom of the Internet states pretty categorically that Microsoft is evil, Google is never evil, Macs are cool and Linux is best of all.

Of course not all of that is true, and you can’t always believe what you read on the Internet. Google isn’t evil but colludes with dictatorships to oppress freedom of their citizens, Macs look cool but have an increasing tendency to break, and Linux has a few small issues.

We’ll get over the fact that I couldn’t install it on my new PC, and the fact that for non-technical users it’s a bit of a nightmare to use and configure.

A couple of days ago I was amused to receive an email from one of the hosting companies I use along the following lines (paraphrased):

Big exploit in Debian!

Bad!

Change all your passwords and run for your lives.

Linux distributions all exist in a sort of family tree. One of the big strengths of open source software (OSS) is that you can edit the source and create your own version of it to suit your needs. This becomes a problem. Looking at this chart we can see that this problem probably effects:

  • Debian
  • Ubuntu
  • KNOPPIX
  • About 20 other derived distributions

And the problem itself?

The problem is with the implementation of OpenSSL, some security software which is widely used. The Debian package editors commented out a line which turned out to be quite important, dramatically reducing the security offered. There’s some good links available from Schneier on Security.

Ben Laurie explains more about the problems:

I’ve ranted about this at length before, I’m sure. But now Debian have proved me right (again) beyond my wildest expectations. Two years ago, they “fixed” a “problem” in OpenSSL. The result of this is that for the last two years anyone doing pretty much any crypto on Debian (and hence Ubuntu) has been using easily guessable keys. This includes SSH keys, SSL keys and OpenVPN keys.

What can we learn from this? Firstly, vendors should not be fixing problems (or, really, anything) in open source packages by patching them locally – they should contribute their patches upstream to the package maintainers. Had Debian done this in this case, we (the OpenSSL Team) would have fallen about laughing, and once we had got our breath back, told them what a terrible idea this was. But no, it seems that every vendor wants to “add value” by getting in between the user of the software and its author.

This is a problem with all code in general, but is a real problem with OSS. Someone writes code one way for a reason, but that reason isn’t always obvious to those maintaining the software. All it takes is someone to do something daft and break it.

Posted in Computing | 1 Response