It's the spec, stupid

Johannesburg, 28 Nov 2012

I recently had a frustrating experience with a local online retailer. I was receiving promotion e-mails from them, despite having unsubscribed. Spam! No, not really - just incompetence.

Your crummy code can easily turn into complaints and negative publicity.

The problem was that I had used an e-mail address tag when I first interacted with the site. Something like: jon+sitename@itweb.co.za. Using that sort of +tag is a useful way of keeping track of what address you use where, but although the site had happily signed me up with that address, its unsubscribe process had choked on it.

To compound matters, the unsub page then reported success, claiming the address had been removed. So I kept getting promotional e-mail, despite thinking I'd successfully unsubscribed. From the outside, that looks very spammy indeed, and it's the sort of behaviour that gets a site reported.

The root cause was certainly nothing more than a little sloppy programming and I got it straightened out. But there's the risk - your crummy code can easily turn into complaints and negative publicity. In extreme cases, you'll end up on an e-mail blacklist, which is a bit of a problem if you have revenue hinging on e-mail promotion. Getting off an RBL is no fun at all. But the problem only starts there - this sort of glitch can be a sign of much deeper problems.

Everyone does it

This was an unusual incident - more commonly, sites just reject those address tags, thinking the plus sign is an invalid character. It is not - it's a perfectly normal part of the e-mail address specification - and it can be frustrating when that happens. So why do so many sites reject it?

They reject it because their developers are lazy or inept or both. And I'm glad when they do, because it's like a shining beacon hanging in the sky saying: "Incompetent! Don't do business here!"

"Hold on," you're saying. "That's a bit harsh. How attached can you be to a plus sign? And why do you expect sites to support a feature that very few people use at all?"

Conspiracy theorists will identify with this, and claim that sites are doing this specifically to prevent e-mail tags from identifying e-mail abusers. Hanlon's Razor suggests otherwise: it's no harder to quietly strip +tag from an address after the fact, if that's what you want to do. No, it's just laziness and incompetence: either the coder was clueless, or made the call that it's easier to inconvenience a small number of users than to do it right.

Tainted data

The reason I care is nothing to do with the inconvenience of untagged e-mail. It's that the e-mail address rejection is just one symptom of how the site handles data, and mishandled data means problems. Quick-and-dirty solutions are exactly where more serious vulnerabilities spring up.

Cutting corners to ship in a hurry is all very Web 2.0, but it's the reason SQL injection is such a widespread problem on the Web. Data validation matters, and if you're skimping on parsing e-mail addresses, why should I believe you're doing a better job anywhere else?

Security people tend to be particular alert to small symptoms that suggest deeper, possibly exploitable, issues. Crummy data validation is very much one of those symptoms.

This really is far from a new problem: many years ago, I identified a vulnerability at an ISP, whose sign-up server was passing new account details through a Unix command line without escaping it. Attackers could run arbitrary commands on their account servers by including special characters in the password field, very similar to the SQL injection attacks of today. And this was the server I was supposed to feed my credit card number into? I don't think so.

To be fair, parsing e-mail addresses in complete accordance with the spec is actually very hard. It includes bizarre fringe cases that no one in their right mind would ever require. And although many Web programming languages do include built-in functions or external libraries that try to detect valid e-mail addresses (or URLs, which are even worse), they are often flawed. Many site devs end up rolling their own, and they do it as quickly as possible because it's a distraction.

I sympathise. I've been there. But I also don't care. If you can't take the time to write or find a piece of code (this is an old problem and there are a lot of known solutions to it) that solves the problem properly, then I don't believe you're taking the job seriously. The risk of related security problems is thus higher, and if we're doing e-commerce or exchanging personally valuable data, I'm unlikely to take the risk.

It's the spec, stupid

Small data glitches are a warning sign of poor business.

Everyone does it

Tainted data