Stocator: A High Performance Object Store Connector for Spark

Behind Picton Street

IBM have published a lovely paper on their Stocator 0-rename committer for Spark

Stocator is:
  1. An extended Swift client
  2. magic in their FS to redirect mkdir and file PUT/HEAD/GET calls under the normal MRv1 __temporary paths to new paths in the dest dir
  3. generating dest/part-0000 filenames using the attempt & task attempt ID to guarantee uniqueness and to ease cleanup: restarted jobs can delete the old attempts
  4. Commit performance comes from eliminating the COPY, which is O(data),
  5. And from tuning back the number of HTTP requests (probes for directories, mkdir 0 byte entries, deleting them)
  6. Failure recovery comes from explicit names of output files. (note, avoiding any saving of shuffle files, which this wouldn't work with...spark can do that in memory)
  7. They add summary data in the _SUCCESS file to list the files written & so work out what happened (though they don't actually use this data, instead relying on their swift service offering list consistency). (I've been doing something similar, primarily for testing & collection of statistics).

Page 10 has their benchmarks, all; of which are against an IBM storage system, not real amazon S3 with its different latencies and performance.

Table 5: Average run time

Read-Only 50GB
Read-Only 500GB
Hadoop-Swift Base
S3a Base
Hadoop-Swift Cv2
S3a Cv2
S3a Cv2 + FU

The S3a is the 2.7.x version, which has the stabilisation enough to be usable with Thomas Demoor's fast output stream (HADOOP-11183). That stream buffers in RAM & initiates the multipart upload once the block size threshold is reached. Provided you can upload data faster than you run out of RAM, it avoids the log waits at the end of close() calls, so has significant speedup. (The fast output stream has evolved into the S3ABlockOutput Stream (HADOOP-13560) which can buffer off heap and to HDD, and which will become the sole output stream once the great cruft cull of HADOOP-14738 goes in)

That means in the doc, "FU" == fast upload, == incremental upload & RAM storage. The default for S3A will become HDD storage, as unless you have a very fast pipe to a compatible S3 store, it's easy to overload the memory

Cv2 means MRv2 committer, the one which does  single rename operation on task commit (here the COPY), rather than one in task commit to promote that attempt, and then another in job commit to finalise the entire job. So only: one copy of every byte PUT, rather than 2, and the COPY calls can run in parallel, often off the critical path

 Table 6: Workload speedups when using Stocator

Read-Only 50GB
Read-Only 500GB
Hadoop-Swift Base
S3a Base
Hadoop-Swift Cv2
S3a Cv2
S3a Cv2 + FU

Their TCP-DS benchmarks show that stocator & swift is slower than TCP-DS Hadoop 2.7 S3a + Fast upload & MRv2 commit. Which means that (a) the Hadoop swift connector is pretty underperforming and (b) with fadvise=random and columnar data (ORC, Parquet) that speedup alone will give better numbers than swift & stocator. (Also shows how much the TCP-DS Benchmarks are IO heavy rather than output heavy the way the tera-x benchmarks are).

As the co-author of that original swift connector then, what the IBM paper is saying is "our zero rename commit just about compensates for the functional but utterly underperformant code Steve wrote in 2013 and gives us equivalent numbers to 2016 FS connectors by Steve and others, before they started the serious work on S3A speedup". Oh, and we used some of Steve's code to test it, removing the ASF headers.

Note that as the IBM endpoint is neither the classic python Openstack swift or Amazon's real S3, it won't exhibit the real issues those two have. Swift has the worst update inconsistency I've ever seen (i.e repeatable whenever I overwrote a large file with a smaller one), and aggressive throttling even of the DELETE calls in test teardown. AWS S3 has its own issues, not just in list inconsistency, but serious latency of HEAD/GET requests, as they always go through the S3 load balancers. That is, I would hope that IBM's storage offers significantly better numbers than you get over long-haul S3 connections. Although it'd be hard (impossible) to do a consistent test there, I 'd fear in-EC2 performance numbers to be actually worse than that measures.

I might post something faulting the paper, but maybe I'll should to do a benchmark of my new committer first. For now though, my critique of both the swift:// and s3a:// clients is as follows

Unless the storage services guarantees consistency of listing along with other operations, you can't use any of the MR commit algorithms to reliably commit work. So performance is moot. Here IBM do have a consistent store, so you can start to look at performance rather than just functionality. And as they note, committers which work with object store semantics are the way to do this: for operations like this you need the atomic operations of the store, not mocked operations in the client.

People who complain about the performance of using swift or s3a as a destination are blisfully unaware of the key issue: the risk of data loss due inconsistencies. Stocator solves both issues at once.

Anyway, means we should be planning a paper or two on our work too, maybe even start by doing something about random IO and object storage, as in "what can you do for and in columnar storage formats to make them work better in a world where a seek()+ read is potentially a new HTTP request."

(picture: parakeet behind Picton Street)


Dissent is a right: Dissent is a duty. @Dissidentbot

It looks like the Russians interfered with the US elections, not just from the alleged publishing of the stolen emails, or through the alleged close links with the Trump campaign, but in the social networks, creating astroturfed campaigns and repeating the messages the country deemed important.

Now the UK is having an election. And no doubt the bots will be out. But if the Russians can do bots: so can I.

This then, is @dissidentbot.

Dissident bot is a Raspbery Pi running a 350 line ruby script tasked with heckling politicans
unrelated comments seem to work, if timely
It offers:
  • The ability to listen to tweets from a number of sources: currently a few UK politicians
  • To respond pick a random responses from a set of replies written explicitly for each one
  • To tweet the reply after a 20-60s sleep.
  • Admin CLI over Twitter Direct Messaging
  • Live update of response sets via github.
  • Live add/remove of new targets (just follow/unfollow from the twitter UI)
  • Ability to assign a probability of replying, 0-100
  • Random response to anyone tweeting about it when that is not a reply (disabled due to issues)
  • Good PUE numbers, being powered off the USB port of the wifi base station, SSD storage and fanless naturally cooled DC. Oh, and we're generating a lot of solar right now, so zero-CO2 for half the day.
It's the first Ruby script of more than ten lines I've ever written; interesting experience, and I've now got three chapters into a copy of the Pickaxe Book I've had sitting unloved alongside "ML for the working programmer".  It's nice to be able to develop just by saving the file & reloading it in the interpreter...not done that since I was Prolog programming. Refreshing.
Strong and Stable my arse
Without type checking its easy to ship code that's broken. I know, that's what tests are meant to find, but as this all depends on the live twitter APIs, it'd take effort, including maybe some split between Model and Control. Instead: broken the code into little methods I can run in the CLI.

As usual, the real problems surface once you go live:
  1. The bot kept failing overnight; nothing in the logs. Cause: its powered by the router and DD-WRT was set to reboot every night. Fix: disable.
  2. It's "reply to any reference which isn't a reply itself" doesn't work right. I think it's partly RT related, but not fully tracked it down.
  3. Although it can do a live update of the dissident.rb script, it's not yet restarting: I need to ssh in for that.
  4. I've been testing it by tweeting things myself, so I've been having to tweet random things during testing.
  5. Had to add handling of twitter blocking from too many API calls. Again: sleep a bit before retries.
  6. It's been blocked by the conservative party. That was because they've been tweeting 2-4 times/hour, and dissidentbot originally didn't have any jitter/sleep. After 24h of replying with 5s of their tweets, it's blocked.
The loopback code is the most annoying bug; nothing too serious though.

The DM CLI is nice, the fact that I haven't got live restart something which interferes with the workflow.
  Dissidentbot CLI via Twitter DM
Because the Pi is behind the firewall, I've no off-prem SSH access.

The fact the conservatives have blocked me, that's just amusing. I'll need another account.

One of the most amusing things is people argue with the bot. Even with "bot" in the name, a profile saying "a raspberry pi", people argue.
Arguing with Bots and losing

Overall the big barrier is content.  It turns out that you don't need to do anything clever about string matching to select the right tweet: random heckles seems to blend in. That's probably a metric of political debate in social media: a 350 line ruby script tweeting random phrases from a limited set is indistinguishable from humans.

I will accept Pull Requests of new content. Also: people are free to deploy their own copies. without the self.txt file it won't reply to any random mentions, just listen to its followed accounts and reply to those with a matching file in the data dir.

If the Russians can do it, so can we.


The NHS gets 0wned

Friday's news was full of breaking panic about an "attack" on the NHS, making it sound like someone had deliberately made an attempt to get in there and cause damage.

It turns out that it wasn't an attack against the NHS itself, just a wide scale ransomware attack which combined click-through installation and intranet propagation by way of a vulnerability which the NSA had kept for internal use for some time.

Laptops, Lan ports and SICP

The NHS got decimated for a combination of issues:
  1. A massive intranet for SMB worms to run free.
  2. Clearly, lots of servers/desktops running the SMB protocol.
  3. One or more people reading an email with the original attack, bootstrapping the payload into the network.
  4. A tangible portion of the machines within some parts of the network running unpatched versions of Windows, clearly caused in part by the failure of successive governments to fund a replacement program while not paying MSFT for long-term support.
  5. Some of these systems within part of medical machines: MRI scanners, VO2 test systems, CAT scanners, whatever they use in the radiology dept —to name but some of the NHS machines I've been through in the past five years.
The overall combination then is: a large network/set of networks with unsecured, unpatched targets were vulnerable to a drive-by attack, the kind of attack, which, unlike a nation state itself, you may stand a chance of actually defending against.

What went wrong?

Issue 1: The intranet. Topic for another post.

Issue 2: SMB.

In servers this can be justified, though it's a shame that SMB sucks as a protocol. Desktops? It's that eternal problem: these things get stuck in as "features", but which sometimes come to burn you. Every process listening on a TCP or UDP port is a potential attack point. A 'netstat -a" will list running vulnerabilities on your system; enumerating running services "COM+, Sane.d? mDNS, ..." which you should review and decide whether they could be halted. Not that you can turn mDNS off on a macbook...

Issue 3: Email

With many staff, email clickthrough is a function of scale and probability: someone will, eventually. Probability always wins.

Issue 4: The unpatched XP boxes.

This is why Jeremy Hunt is in hiding, but it's also why our last Home Secretary, tasked with defending the nation's critical infrastructure, might want to avoid answering questions. Not that she is answering questions right now.

Finally, 5: The medical systems.

This is a complication on the "patch everything" story because every update to a server needs to be requalified. Why? Therac-25.

What's critical here is that the NHS was 0wned, not by some malicious nation state or dedicated hacker group: it fell victim to drive-by ransomware targeted at home users, small businesses, and anyone else with a weak INFOSEC policy This is the kind of thing that you do actually stand a chance of defending against, at least in the laptop, desktop and server.


Defending against malicious nation state is probably near-impossible given physical access to the NHS network is trivial: phone up at 4am complaining of chest pains and you get a bed with a LAN port alongside it and told to stay there until there's a free slot in the radiology clinic.

What about the fact that the NSA had an exploit for the SMB vulnerability and were keeping quiet on it until the Shadow Brokers stuck up online? This is a complex issue & I don't know what the right answer is.

Whenever critical security patches go out, people try and reverse engineer them to get an attack which will work against unpatched versions of: IE, Flash, Java, etc. The problems here were:
  • the Shadow Broker upload included a functional exploit, 
  • it was over the network to enable worms, 
  • and it worked against widely deployed yet unsupported windows versions.
The cost of developing the exploit was reduced, and the target space vast, especially in a large organisation. Which, for a worm scanning and attacking vulnerable hosts, is a perfect breeding ground.

If someone else had found and fixed the patch, there'd still have been exploits out against it -the published code just made it easier and reduced the interval between patch and live exploit

The fact that it ran against an old windows version is also something which would have existed -unless MSFT were notified of the issue while they were still supporting WinXP. The disincentive for the NSA to disclose that is that a widely exploitable network attack is probably the equivalent of a strategic armament, one step below anything that can cut through a VPN and the routers, so getting you inside a network in the first place.

The issues we need to look at are
  1. How long is it defensible to hold on to an exploit like this?
  2. How to keep the exploit code secure during that period, while still using it when considered appropriate?
Here the MSFT "tomahawk" metaphor could be pushed a bit further. The US govt may have tomahawk missiles with nuclear payloads, but the ones they use are the low-damage conventional ones. That's what got out this time.

WMD in the Smithsonia

One thing that MSFT have to consider is: can they really continue with the "No more WinXP support" policy? I know they don't want to do it, the policy of making customers who care paying for the ongoing support is a fine way to do it, it's just it leaves multiple vulnerabilites. People at home, organisations without the money and who think "they won't be a target", and embedded systems everywhere -like a pub I visited last year whose cash registers were running Windows XP embedded; all those ATMs out there, etc, etc.

Windows XP systems are a de-facto part of the nation's critical infrastructure.

Having the UK and US governments pay for patches for the NHS and everyone else could be a cost effective way of securing a portion of the national infrastructure, for the NHS and beyond.

(Photos: me working on SICP during an unplanned five day stay and the Bristol Royal Infirmary. There's a LAN port above the bed I kept staring at; Windows XP Retail packaging, Smithsonian aerospace museum, the Mall, Washington DC)


Is it time to fork Guava? Or rush towards Java 9?

Lost Crew WiP

Guava problems have surfaced again.

Hadoop 2.x has long-shipped Guava 14, though we have worked to ensure it runs against later versions, primarily by re-implementing our own classes of things pulled/moved across versions.

Hadoop trunk has moved up to Guava 21.0, HADOOP-10101.This has gone and overloaded the Preconditions.checkState() method, such that: if you compile against Guava 21, your code doesn't link against older versions of Guava. I am so happy about this I could drink some more coffee.

Classpaths are the gift that keeps on giving, and any bug report with the word "Guava" in it is inevitably going to be a mess. In contrast, Jackson is far more backwards compatible; the main problem there is getting every JAR in sync.

What to do?

Shade Guava Everywhere
This is going too be tricky to pull off. Andrew Wang has taken on this task. this is one of those low level engineering projects which doesn't have press-release benefits but which has the long-term potential to reduce pain. I'm glad someone else is doing it & will keep an eye on it.

Rush to use Java 9
I am so looking forward to this from an engineering perspective:

Pull Guava out
We could do our own Preconditions, our own VisibleForTesting attribute. More troublesome are the various cache classes, which do some nice things...hence they get used. That's a lot of engineering.

Fork Guava
We'd have to keep up to date with all new Guava features, while reinstating the bits they took away. The goal: stuff build with old Guava versions still works.

I'm starting to look at option four. Biggest issue: cost of maintenance.

There's also the fact that once we use our own naming "org.apache.hadoop/hadoop-guava-fork" then maven and ivy won't detect conflicting versions, and we end up with > 1 version of the guava JARs on the CP, and we've just introduced a new failure mode.

Java 9 is the one that has the best long-term potential, but at the same time, the time it's taken to move production clusters onto Java 8 makes it 18-24 months out at a minimum. Is that so bad though?

I actually created the "Move to Java 9": JIRA in 2014. It's been lurking there, Akira Ajisaka doing the equally unappreciated step-by-step movement towards it.

Maybe I should just focus some spare-review-time onto Java 9; see what's going on, review those patches and get them in. That would set things up for early adopters to move to Java 9, which, for in-cloud deployments, is something where people can be more agile and experimental.

(photo: someone painting down in Stokes Croft. Lost Crew tag)


Mocking: an enemy of maintenance

Bristol spring

I'm keeping myself busy right now with HADOOP-13786, an O(1) committer for job output into S3 buckets. The classic filesystem relies on rename() for that, but against S3 rename is a file-by-file copy whose time is O(data) and whose failure mode is "a mess", amplified by the fact that an inconsistent FS can create the illusion that destination data hasn't yet been deleted: false conflict.
. This creates failures like SPARK-18512., FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A, as well as long commit delays.

I started this work a while back, making changes into the S3A Filesystem to support it. I've stopped focusing on that committer, and instead pulled in the version which Netflix have been using, which has the advantages of a thought out failure policy, and production testing. I've been busy merging that with the rest of the S3A work, and am now at the stage where I'm switching it over to the operations I've written for the first attempt, the "magic committer". These are in S3A, where they integrate with S3Guard state updates, instrumentation and metrics, retry logic, etc etc. All good.

The actual code to do the switchover is straightforward. What is taking up all my time is fixing the mock tests. These are failing with false positives "I've broken the code", when really the cause is "these mock tests are too brittle". In particular, I've had to rework how the tracking of operations goes, as a Mock Amazon S3Ciient is no longer used by the committer, instead its associated with the FS instance, which then is shared by all operations in a single test method. And the use of S3AFS methods shows up where its failing due to the mock instance not initing properly. I ended up spending most of Tuesday simply implementing the abort() call, now I'm doing the same on commit(). The production code switches fine, it's just the mock stuff.

This has really put me off mocking. I have used it sporadically in the past, and I've occasionally had to work other people's. Mocking has some nice features
  • Can run in unit tests which don't need AWS credentials, so Yetus/Jenkins can run them on patches.
  • Can be used to simulate failures and validate outcomes.
But the disadvantage is I just think they are too high maintenance. One test I've already migrated to being an integration test against an object store; I retained the original mock one, but just deleted that yesterday as it was going to be too expensive to migrate, and, with
that IT test, obsolete.

The others, well: the changes for abort() should help, but every new S3A method that gets called triggers new problems which I need to address. This is, well, "frustrating".

It's really putting me off mocking. Ignoring the Jenkins aspect, the key benefit is structure fault injection. I believe I could implement that in the IT tests too, at least in those tests which run in the same JVM. If I wanted to, I could probably even do it in the forked VMs by f propagating details on the desired failures to the processes. Or, if I really wanted to be devious, by running an HTTP proxy in the test VM and simulating network failures for the AWS client code itself to hit. That wouldn't catch all real-world problems (DNS, routing), but I could raise authentication, transient HTTP failures, and of course, force in listing inconsistencies. This is tempting, because it will help me qualify the AWS SDK we depend on, and could be re-used for testing the Azure storage too. Yes, it would take effort —but given the cost of maintaining those Mock tests after some minor refactoring of the production code, it's starting to look appealing.

(photo: Garage door, Greenbank, Bristol)


The interruption economy

With the untimely death of a laptop in Boston in February, I've rebuilt two laptops recently.

The first: a replacement for the dead one: a development macbook pro wired up to the various bits of work infra: MS office, VPN,  even hipchat. The second, a formerly dead 2009 macbook brought back to life with a 256GB SSD and a boost of its RAM to 8GB (!).

Doing this has brought home to be a harsh truth

The majority of applications you install on an OSX laptop consider it not just a right, but a duty, to interrupt you while you are trying to work.

It's not just the things where someone actually want's to talk  to (e.g. skype), it's pretty much everything you can install

For example, iTunes wants to be able to interrupt me, including playing sounds. It's a music player application, and it also wants to make beeping noises? Same for spotify. Why should background music apps or foreground media playback apps think they need to be able to interrupt you when they are running in the background?

iTunes wants to interrupt me

Dropbox. I didn't realise this was doing notifications until it suddenly popped up to tell me the good news that it was keeping itself up to date automatically.

Dropbox interrupting me with a random fact

Keeping your installation up to date is something we should expect all applications to do. It should not be so important that you should pop up a dialog box "good news, you are only at risk from 0-day exploits we haven't found or patched yet!". Once I was aware that dropbox was happy to interrupt me, I went to its settings, only to discover that it also wants to interrupt me on "comments, share's and @mentions", and on synced files.
Dropbox wants to harass me

I hadn't noticed that a tool I used to sync files across machines had evolved into a groupware app where people could @mention me, but clearly it has, and in teams, interruptions whenever someone comments on things is clearly considered good. It also wants to interrupt me on files syncing. Think about that. We have an application whose primary purpose is "synchronising files across machines", and suddenly it wants to start popping up notifications when it is doing its job? What else should we have? Note taking applications sharing the good news that they haven't crashed yet?
Apple Notes wants to interrupt me

Maybe, because amongst the apps which also consider interruption and inalienable right are: OneNote and macOS notes app. I have no idea what they want to interrupt me about: Notes doesn't specify what it wants to alert me about, only that it wants to notify me on locked screens and make a noise. OneNote? Lets you spec which notebooks can trigger interrupts, but again, the why is missing.

The list goes on. My password manager, text editor, IDE. Everything I install defaults to interrupting me.

Yes, you can turn the features off, but on a newly installed machine, that means that you have to go through every single app and disable every single interruption point. Miss out some small detail and while you are trying to get some work done, something pops up to say "lucky you! Something has happened which Photos thinks it is so important you should stop what you are doing and use it instead!". when you are building up two laptops, it means there's about 20+ times I've had to bring up the notifications preference pane, scroll down to whichever app last interrupted me, turn off all its notifications, then continue until something else chooses to break my concentration.

The web browsers want to let web pages interrupt you too.

Firefox you can't disable it, at least not without delving into about:config.

Firefox doesn't seem to let me utterly disable interrupts

You can block it in the OS notifications settings, which implies it is at least integrated with the OS and the system-wide do-not-disturb feature.

Chrome: you can manage it in the browser —even though google don't want you to stop it, but it doesn't appear to  integrated with the OS;

Google chrome recommends interruptiblity

With the OS integration, OSX's do-not-disturb feature won't work. will work here, so if you do let Chrome notify you, webapps gain the right to interrupt you during presentations, watching media content, etc.
Safari lets you disable web site notifications, you just have to clear the check box
Safari? Permitted, but OS controlled, completely blockable. This doesn't mean that webapps shouldn't be able to interrupt you: google calendar is a good example, it's just the easier we make it to do this, the more sites will want to.

The OS isn't even consistent itself. There is no way to tell time machine to not annoy you with the fact that it hasn't updated for 11 days. It's not part of the notification system, even though it came from the same building. What kind of example is that to set for others?

Because the default behaviour of every application is to interrupt, I have to go through every single installed app to disable it else my life is a constant noise of popups stating irrelevant facts. You may not notice that as you install one application at a time, turning off the settings individually, but when you build up a new box, the arrogance of all these applications becomes obvious, as it takes some time to actually stop your attention being attacked by the software you install.

Getting users to look at your app, your web site, is roped in as "The attention economy". That certainly applies to things like twitter, facebook, snapchat, etc. But how does translate into dropbox trying to get my attention to tell me that it's keeping itself up to date? Or whatever itunes or photos wants to interrupt me on? Why does OneNote need to tell me something about a saved workbook? This isn't "the attention economy". This is "interruption economy": people terrified that users may not be making full use of their features, so trying to keep popping up to encourage you to use the app or whatever new feature they've just installed

Interrupting people while they are trying to work is not a good use of the life of people whose work depends on "getting things done without interruptions". As my colleagues should know, though some of them forget, I don't run with hipchat on precisely because I hate getting popups "hey Steve, can i just ask..." , where the ask is something that I'd google for the answer myself, so why somebody asks me to google for them, I don't know. But even with the workflow interrupts off, things keep trying to stop me getting anything done

Then there's the apps which interrupt without any warning at all. I got caught out at this at Dataworks summit, where halfway through a presentation GPGMail popped up telling me there was a new version. This was a presentation where I'd explicitly set "do not disturb" on and war running full screen, but GPG mail checks weren't using it. Lesson: turn off the wifi as well as setting everything to do-not-disturb/offline.

Those update prompts, they are important. But everything keeps going "update me! now!" they end up being an irritant to ignore, just like the way the "service now!" alert pops up our car when we use it. It's just another low-level hint, not something which matters like "low pressure in tyres".

What it does really highlight is that having an applications keep itself up to date with security patches is still considered, on OSX, to be something worth interrupting the user to let them know about. All I can say it's a good thing that Linux apps don't feel the same way, or apt-get upgrade would be unbearable.

Finally, there's the OS
  • It'd be good if the OS recognised when a full screen media/presentation app was underway and automatically went into silent mode at that point.
  • All the OS's own notifications "upgrade available", "no time machine backups" should be integrated with the same notification mechanisms for app viewers. That's to help the users, but also set an example for all others.

What to to really do about it?

I'd really like to be able to tell the OS that the default settings for any newly installed app is "no notifications". Maybe now I've built up the laptops I won't have to go through the torment of disabling it across many apps, so it'll just be that case by case irritant. Even so, there's still the pain of being reminded of update options even

What I can do though, is promise not to personally write applications which interrupt people by default.

Here then, is my pledge:
  1. I pledge to give my users the opportunity to live a life free of interruptions, at least from my own code.
  2. I pledge not to write applications which bring up notification boxes to tell you that they have kept themselves up to date automatically, that someone has logged in to another machine, or that someone else is viewing a document a user has co-authored.
  3. Ideally, the update mech should integrate that from the OS, and so it can handle the notifications (or not).
  4. If I then add a notifications in an application for what I consider to be relevant information, I pledge for the default state to be "don't".
  5. They will all go away when left alone.
  6. Furthermore, I pledge to use the OS supplied mechanism and integrate with any do- not-disturb mechanism the OS implements.
I know, I haven't done do client side code for a long time, but I can assure people, if I did: I'd try to be much less annoying than what we have today. Because I recognise how much pain this causes.


    The Great S3 Outage of February 2017

    On tuesday the world split into different groups
    1. Those who knew that S3 was down, and the internet itself was in crisis.
    2. Those who knew that some of the web sites and phone apps they used weren't working right, but didn't know why.
    3. Those who didn't notice and wouldn't have cared.

    I was obviously in group 1, the engineers, who whisper to each other, "where were you when S3 went down".
    S3 Outage: Increased Error Rate

    I was running the latest hadoop--aws s3a tests, and noticed as some of my tests were failing. Not the ones to s3 Ireland, but those against the landsat bucket we use in lots of our hadoop test as it is a source of a 20 MB CSV file where nobody has to pay download fees, or spend time creating a 20 MB CSV file. Apparently there are lots of landsat images too, but our hadoop tests stop at: seeking in the file. I've a spark test which does the whole CSV parse thing., as well as one I use in demos as an example not just of dataframes against cloud data, but of how data can be dirty, such as with a cloud cover of less than 0%.

    Partial test failures: never good.

    It was only when I noticed that other things were offline that I cheered up: unless somehow my delayed-commit multipart put requests had killed S3: I wasn't to blame. And with everything offline I could finish work at 18:30 and stick some lasagne in the oven. (I'm fending for myself & keeping a teenager fed this week).

    What was impressive was seeing how deep it went into things. Strava app? toast. Various build tools and things? Offline.

    Which means that S3 wasn't just a SPOF for my own code, but a lot of transitive dependencies, meaning that things just weren't working -all the way up the chain.

    S3 Outage: We can update our status page

    S3 is clearly so ubiquitous a store that the failure of US-East enough to have major failures, everywhere.

    Which makes designing to be resilient to an S3 outage so hard: you not only have to make your own system somehow resilient to failure, you have to know how your dependencies cope with such problems. For which step one is: identify those dependencies.

    Fortunately, we all got to find out on Tuesday.

    Trying to mitigate against a full S3A outage is probably pretty hard. At the very least,
    1. replicated front end content across different S3 installations would allow you to present some kind of UI.
    2. if you are collecting data for processing, then a contingency plan for the sink being offline: alternate destinations, local buffering, discarding (nifi can be given rules here).
    3. We need our own status pages which can be updated even if the entire infra we depend on is missing. That is: host somewhere else, have multiple people with login rights, so an individual isn't the SPOF. Maybe even a facebook page too, as a final backup
    4. We can't trust the AWS status page no more.
    Is it worth putting in lots of effort to eliminating an S3 outage as a SPOF? Well, the failure rate is such that it's a lot of effort for a very rare occurence. If you are user facing, some app like strava, maybe it's easiest to say "no". If you are providing a service for others though, availability, or at least the ability to degrade QoS is something to look at.

    Anyway, we can now celebrate the fact that the entire internet now runs in four places: AWS, Google, Facebook and Azure. And we know what happens when one of them goes offline.


    Why HTTPS is so essential, and auto-updating apps so dangerous

    I'm building up two laptops right now. One, a work one to replace the four year old laptop which died. The other, a mid 2009 macbook pro which I've refurbed with an SSD and clean built up.

    As I do this, I'm going through every single thing I'm installing to make sure I do somewhat trust it. That's me ignoring homebrew and where it pulls stuff from when I type something like "brew install calc". What I am doing is checking the provenance of everything else I pull down: validating any SHA-256 hashes they declare; making sure they come off HTTPS URLs, etc. The foundational stuff.

    We have to recognise that serving software up over HTTP is something to be phasing out, and, if it is done, for the SHA-256 checksum to be published over HTTPS, or,  even better, for the checksum to be signed by a GPG key, after which it can be served anywhere. while OSX supports signed DMG files since OS/X El Capitan, and unless you expect the disk image to be signed, you aren't going to notice when you pick up an unsigned malware variant.

    It's too easy for an open wifi station to redirect HTTP connections to somewhere malicious, and we all roam far too much. I realised while I was travelling, that all it would take to get lots of ASF developers on your malicious base station is simply to bring it up in the hotel foyer or in a quiet part of the conference area, giving it the name of the hotel or conference respectively. We conference-goers don't have a way to authenticate these wifi networks.

    Anyway, most binaries I am downloading and installing are coming off HTTPS, which is reassuring.

    One that doesn't is virtualbox: Oracle are still serving these up over HTTP. They do at least serve up the checksums over HTTP, but they don't do much in highlighting how much checking matters. No "to ensure that these binaries haven't been replaced by malicious one anywhere between your laptop and us, you MUST verify the checksums. No, it's just a mild hint, " You might want to compare the SHA256 checksums or the MD5 checksums to verify the integrity of downloaded packages".

    Not HTTPS then, but with the artifacts something whose checksum I can validate from HTTPS. These are on the dev box, happily.

    But here's something that I've just installed on the older, household laptop, "dogbert": Garmin Express. This is little app which looks at the data in a USB mounted Garmin bike computer, grabs the latest activities and updates them to Garmin's cloud infrastructure, where they make their way to Strava, somehow. Oh, and pushes firmware updates the other direction.

    The Garmin Express application is downloaded over HTTP, no MD5, SHA1 or anything else. And while the app itself is signed, OSX can and will run unsigned apps if the permissions are set. I have to make sure that the "allow from anywhere" option is not set in the security panel before running any installer.

    Here's the best bit though: that application does auto updates, any time, anywhere.
    Garmin Express D/Ls from HTTP; autoupdate by default
    Which means that little app, set to automatically run on boot, is out there checking for notifications of an updated application, then downloading it. It doesn't install it, but it will say "here's an update" and launch the installer.

    Could I use this to get something malicious onto a machine? Maybe. I'd have to see if the probes for updates were on HTTP vs HTTPS, and if HTTP, what the payload was. If it was HTTPS, well, you are owned by whoever has their CAs installed on your system. That's way out of scope. But if HTTP is used, then getting the Garmin app to install an unsigned artifact looks straightforward. In fact, even if the update protocol is over HTTPS, given the artifact names of the updates can be determined, you could just serve up malicious copies all the time and hope that someone picks it up That's less aggressive through, and harder to guarantee any success from subverted base stations at a conference.

    Rather than go to the effort of wireshark, we can play with lsof to see what network connections are set up on process launch

    # lsof -i -n -P | grep -i garmin
    Garmin 9966 12u 0x5ccb80e39679382b>
    Garmin 9966 16u 0x5ccb80e39679382b>
    Garmin 9967 10u 0x5ccb80e396b4a82b>
    Garmin 9967 13u 0x5ccb80e39687182b>
    Garmin 9967 15u 0x5ccb80e3910b7a1b>
    Garmin 9967 16u 0x5ccb80e39669e63b>
    Garmin 9967 17u 0x5ccb80e396b4a82b>
    Garmin 9967 18u 0x5ccb80e39687182b>
    Garmin 9967 19u 0x5ccb80e3910b7a1b>
    Garmin 9967 20u 0x5ccb80e3960c782b>
    Garmin 9967 21u 0x5ccb80e39669e63b>
    Garmin 9967 22u 0x5ccb80e3979fa63b>
    Garmin 9967 23u 0x5ccb80e3910b4d43>
    Garmin 9967 24u 0x5ccb80e3910b4d43>
    Garmin 9967 25u 0x5ccb80e3979fa63b>
    Garmin 9967 26u 0x5ccb80e3960c782b> turns out to be https://garmin.com/, so it is at least checking in over HTTPS there. What about the address? Interesting indeed. tap that into firefox as and then go through the advanced bit of the warning, and you can see that the certificate served up is valid for a set of hosts:

    dc.services.visualstudio.com, eus-breeziest-in.cloudapp.net, eus2-breeziest-in.cloudapp.net, cus-breeziest-in.cloudapp.net, wus-breeziest-in.cloudapp.net, ncus-breeziest-in.cloudapp.net, scus-breeziest-in.cloudapp.net, sea-breeziest-in.cloudapp.net, neu-breeziest-in.cloudapp.net, weu-breeziest-in.cloudapp.net, eustst-breeziest-in.cloudapp.net, gate.hockeyapp.net, dc.applicationinsights.microsoft.com

    That's interesting because it means its something in azure space. in particular, rummaging around brings up hockeyapp.net as a key possible URL, given that Hockeyapp Is a monitoring service for instrumented applications. I distinctly recall selecting "no" when asked if I wanted to participate in the "help us improve our product" feature, but clearly something is being communicated. All these requests seem to go away once app launch is complete, but it may be on a schedule. At least now I can be somewhat confident that the checks for new versions are being done over HTTPS; I just don't trust the downloads that come after.


    Towards a doctrine of the Zero Day

    The Stuxnet/Olympic games malware is awesome and the engineering teams deserve respect. There, I said it. The first in-the-field sighting of a mil-spec virus puts the mass market toys to shame. It is the difference between the first amateur rockets and the V1 cruise and V2 ballistic missiles launched against the UK in WWII. It also represents that same change in warfare.

    V1 Cruise missle and V2 rocket

    I say this having watched the documentary Zero Days about nation-state hacking. One thing I like about it is it's underdramatization of the coders. Gone the clich├ęd angled shots of the hooded faceless hacker coding in darkness to a bleeping text prompt on a screen that looks like something from the matrix. Instead: offices with fluorescent lights compensating for the fact that the only people allocated windows are managers. What matrix-esque screen shots there were contained x86 assembly code in the font of IDA, showing asm code snippets accurate enough to give me flashbacks of when I wrote Win32/C++ code. Add some music and coffee mugs and it'd start to look like the real world.

    The one thing they missed out on is the actual engineering; the issue tracker, with OLYMPIC-342, "doesn't work with Farsi version of Word" being the topic of the standup; the monthly regression test panic when when windows or flash updates shipped and everyone feared the upgrade had fixed the exploits. Classic engineering, hampered by the fact that the end users would never send stack traces. Even determining if your code worked in production would depend on intermittent status reports from the UN or order numbers for new parts from down the centrifuge supply chain. Let's face it: even getting the test hardware must have been an epic achievement of its own.

    Because Olympic Games was not just a piece of malware using multiple zero days and stolen driver certificates to gain admin access on gateway systems before jumping the airgap over USB keys and then slowly sabotage the Iranian centrifuges. It was evidence that the government(s) behind decided that cyber-warfare (a term I really hate) had moved from a theoretical "look, this uranium stuff has energy" to the strategic "let's call this the manhattan project"

    And it showed that they were prepared to apply their work against a strategic asset of another country, during peacetime. And had a larger program Nitro Zeus, intended to be the opening move of a war with Iran.

    As with those missiles and their payloads, the nature of war has been redefined.

    In Churchill's epic five volume history of WWII, he talks about the D-day landings, and how he wanted to watch it from a destroyer, but was blocked by King George, you ware too valuable". Churchill wrote that everyone on those beaches felt that they were too valuable to be there too -and that the people making the decisions should be there to see the consequences of them. He shortly thereafter goes on to discuss the first V1 attacks on London, discussing their morality. He felt that the "war-head". (a new word) was too indiscriminate. He was right - but given this was 14 months ahead of August 1945, his morality didn't run that deep. Or the V1 and V2 bombings had convinced him that it was the future. (Caveat: I've ignored RAF Bomber Command as it would only complicate this essay).

    Eric Schlosser's book, Command and Control, discussed the post-war evolution of defence strategy in a nuclear age, and how nuclear weapons scared the military. before: 1000 bombers to destroy a city like Hamburg or Coventry. Now only one plane had to get through the air defences, and the country had lost. Which changed the economics and logistics of destroying nearby countries. The barrier to entry had just been reduced.

    The whole strategy of Mutually Assured Destruction evolved there, which, luckily for us, managed to scrape us though to the twenty-first century: to now. But that doctrine wasn't immediate, and even there, the whole notion of tactical vs. strategic armaments skirted around the fact that once the first weapons went off over Germany or Korea, things were going to escalate.

    Looking back though, you can see those step changes in technology and how the leading edge technologies of each war enabled the doctrine of the next. the US civil war: rifles, machine guns, ironclad naval vessels, the first wire obstacles on the battlefield. WWI: the trenches with their barbed wire and machine guns; planes and tanks the new tech, radio the emergent communications alongside those telegraphs issuing orders to "go over the top!" . WWII and Blitzkreig was built around planes and trains, radio critical to choreograph it; the Spanish civil war used to hone the concept and to inure Europe to the acceptance of bombing cities.

    And in the Cold War, as discussed, missiles, computers and nuclear weapons were the tools of choice.

    What now? Nuclear missiles are still the game-over weapons for humanity, but the non-nuclear weapons have changed and so the tactics of war have changed at. And just as the Manhattan Project showed how easy it was to flatten a city, the Olympic Games has shown how much damage you can do with laptops and a dedicated engineering team.

    One of the screenshots in the documentary was of the North Korean dev team. They don't look like a dev team I'd recognise. It looks like the place where "breaking the build" carries severe punishment rather than having to keep the "I broke the build!" poster(*) up in your cubicle until a successor inherited it. But it was an engineering team, and a lot less expensive than their same government's missile program. And, it's something which can be used today, rather than used as a threat you dare not use.

    What now? We have the weapons, perhaps a doctrine will emerge. What's likely is that you'll see multiple levels of attack

    The 2016 election; the Sony hack: passive attack: data exfiltration and anonymous & selective release. We may as well assume the attacks are common, it's only in special cases that we get to directly see the outcome so tangibly.

    Olympic Games and the rumoured BTC pipeline attack: destruction of targets -in peacetime, with deniability. These are deliberate attacks on the infrastructures of nations, executed without public announcement.

    Nitro Zeus (undeployed) : this is the one we all have to fear in scale, but do we have to fear it's use? As the opening move to an invasion, it's the kind of thing that could be deployed against Estonia or other countries previously forced into the CCCP against their will. Kill all communications, shut down the the cities and within 24h Russian Troops could be in there "to protect Russian speakers from the chaos". China as a precursor to a forced reunification with Taiwan. Then there's North Korea. It's hard to see what a country that irrational would do -especially if they thought they could get away with it.

    Us in the west?

    Excluding Iraq, the smaller countries that Trump doesn't like: Cuba, N. Korea lack that infrastructure to destroy. The big target would be his new enemy, China -but hopefully the entirety of new administration isn't that mad. So instead it becomes a deterrent against equivalent attacks from other nation states with suitable infrastructure.

    What we can't do though is use to as a deterrent for Stuxnet-class attacks, not just on account of the destruction it would cause, but because it's so hard to attribute blame.

    I suspect what is going to happen is something a bit like the evolution of the Drone Warfare doctrine under Obama: it'll become acceptable to deploy Stuxnet-class attacks against other countries, in peacetime. Trump would no doubt love the power, though his need to seek public adulation will hamper the execution. You can't deny your work when your president announces it on twitter.

    At the same time, I can imagine the lure of non-attributable damage to a competing nation state. Something that hurts and hinders them -but if they can't point the blame , what's not to lose.? That I could the Trump Regime going for -and if it does happen to, say, China, and they work it out -well, it's going to escalate.

    Because that has always been the problem with the whole tactical to strategic nuclear arsenal. Once you've made the leap from conventional to nuclear weapons, it was going to escalate all the way.

    Do we really think "cyber-weaponry" isn't going to go the same way? From deleting a few files, or shutting down a factory to disrupting transport, a power grid?

    (*) the poster was a photo of the George Bush "mission accomplished" carrier landing, as I recall.


    TRIDENT-877 missile veered towards wrong continent; hemisphere

    Apparently a test of an submarine launched trident missile went wrong, it started to head in the wrong direction and chose to abort its flight. The payload ended up in the Bahamas.

    Aeronautics Museum

    The whole concept of software engineering came out of a NATO conference in 1968.

    The military were the first to hit this, because they were building the most complex systems: airplanes, ships, submarines, content-wide radar systems. And of course: missiles.

    Missiles whose aim in life is to travel from a potentially mobile launch location to a preplanned destination, via a suborbital ballistic trajectory. It's inevitably a really complex problem: you've got a multistage rocket designed to be moved around in a submarine for decades, designed to be launched without much preparation at a target a few thousand miles away. Which must make the navigation a fun little problem.

    We can all use GPS to work out where we are, even spacecraft which know to use the other solution to the GPS timing equation - the one which doesn't have a solution close to the geode, our model of the Earth's surface. Submarines can't use GPS while under water and they, like their deliverables, can't rely on the GPS constellation existing at the time of use. Which leaves what? Gyroscopic compasses, and inertial navigation systems: mindnumbingly complex bits of sensor trying to work out acceleration on different axes, use that, time, and its knowledge of its starting point to work out where it is. Then there's a little computer nearby using that information to control the rocket engines.

    Once above enough of the atmosphere to see stars in daylight, the missiles switch to astronomy. This turns out to be an interesting area of ongoing work -IR CCDs can position vehicles at sea level when it's not cloudy (tip: always choose your war zones in desert climates). While the Trident missiles are unlikely to have been updated, a full submarine refresh is bound to have installed the shiny new stuff. And in an qualification test of a real launch -that's something you'd want to try. Though of course you would compare any celestial position data with the GPS feed.

    Yet somehow it failed. Apparently this was a "telemetry problem", the missile concluded that something had gone wrong and chose to crash into the sea instead. I'm really curious about the details now, though we'll never get the specifics at a level to be that informative. First point: telemetry from the submarine to the missile? That is, something tracking the launch and providing (authenticated?) data to the missile which it could compare with its own measures? Or was it the other way around: missile data to submarine? As that would seem more likely -having the missile broadcast out an encrypted stream of all its engine data and sensor input would be exactly what you want to identify launch time problems. Perhaps it was some new submarine software which got confused, or got fed bad data somehow. If that was the case, then, if you could replicate the failure by feeding in the same telemetry, then yes, you could fix it and be confident that the specific failure was found and addressed. Except: you can't be confident that there weren't more problems from that telemetry, or other things to go wrong -problems which didn't show up as the missile had been aborted
    Or it was in-missile; sensor data on the rockets misleading the navigation system. In which case: why use the term "telemetry".

    We aren't ever going to know the details, which is a pity as it would be interesting to know. It's going to be kept a secret though, not just for the sake of whoever we consider our enemies to be —but because it would scare us all.

    I don't see that you can say the system is production ready if there was any software problem. One with wiring up, maybe, or some other hardware problem where a replacement board -a well qualified board- could be swapped in. Maybe even an operations issue which can be addressed with changes in the runbook. But software? No.

    How do you show it works then? Well, testing is the obvious tactic, except, clearly, we can't afford to. Which is a good argument in favour of cruise missiles over ICBMs: they cost less to test.

    Tomahawk Cruise missile

    Governments just don't take into account the software engineering and implementation details of modern systems into account, of which missiles are a special case, but things like the F-35 Joint Strike Fighter another. Some the software from that comes from BAe Systems a few miles away, and from what I gather, it's a tough project. The usual: over-ambitious goals and deadlines, conflicting customers, integration problems, suppliers blaming each other, etc, etc. Which is why the delivery and quality of the software is called out a a key source of delays, this in what is self-admittedly the world's largest defence programme.

    It's not that the teams aren't competent —it's that the systems we are trying to build are beyond what we can currently do, despite that ~50+ years of Software Engineering.