How long does FileSystem.exists() take against S3?

Ice on the downs

One thing I've been working on with my colleagues is improving performance of Hadoop, Hive and Spark against S3, one exists() or getFileStatus() call at a time.

Why? This is a log of a test run showing how long it takes to query S3 over a long haul link. This is midway through the test, so the HTTPS connection pool is up, DNS has already resolved the hostnames. So these should be warm links to S3 US-east. Yet it takes over a second just for one probe.
2016-12-01 15:47:10,359 - op_exists += 1  ->  6
2016-12-01 15:47:10,360 - op_get_file_status += 1  ->  20
2016-12-01 15:47:10,360 (S3AFileSystem.java:getFileStatus) -
  Getting path status for s3a://hwdev-stevel/numbers_rdd_tests
2016-12-01 15:47:10,360 - object_metadata_requests += 1 -> 39
2016-12-01 15:47:11,068 - object_metadata_requests += 1 -> 40
2016-12-01 15:47:11,241 - object_list_requests += 1 -> 21
2016-12-01 15:47:11,513 (S3AFileSystem.java:getFileStatus) -
  Found path as directory (with /)
The way we check for a path p in Hadoop's S3 Client(s) is
LIST prefix=p, suffix=/, count=1
A simple file: one HEAD. A directory marker, two, a path with no marker but 1+ child: three. In this run, it's an empty directory, so two of the probes are executed:
HEAD p => 708ms
HEAD p/ => 445ms
LIST prefix=p, suffix=/, count=1 => skipped
That's 1153ms from invocation of the exists() call to it returning true —long enough for you to see the log pause during the test run. Think about that: determining which operations to speed up not through some fancy profiler, but watching when the log stutters. That's how dramatic the long-haul cost of object store operations are. It's also why a core piece of the S3Guard work is to offload that metadata storage to DynamoDB. I'm not doing that code, but I am doing the committer to go with. To be ruthless, I'm not sure we can reliably do that O(1) rename, massively parallel rename being the only way to move blobs around, and the committer API as it stands precluding me from implementing a single-file-direct-commit committer. We can do the locking/leasing in dynamo though, along with the speedup.

What it should really highlight is that an assumption in a lot of code "getFileStatus() is too quick to measure" doesn't hold once you move into object stores, especially remote ones, and that any form of recursive treewalk is potentially pathologically bad.
Remember that that next time you edit your code.


Film Review: Arrival — Whorfian propaganda

Montepelier and beyond

Given the audience numbers for Arrival, in the first fortnight of its public release, more people will have encountered linguistic theory and been introduced to the Sapier-Whorf hypothesis than in the entire history of the study of linguistics (or indeed CS & AI courses, where I presume I first encountered it).

But it utterly dodges Chomsky's critique —that being the second irony: more people know Noam Chomsky(*) for his political opinions than his contributions to linguistics and his seminal work on Grammar; regexp being type 3, and HTML being very much not. While I'm happy to willingly suspend my disbelief about space aliens appearing from nowhere, the notion that S-W implies learning a new language changes the semantics of happens-before. grated on me. I'd have really preferred an ending where the lead protagonists retreat and admit defeat to the government, wherein Chomsky does a cameo, "told you!" before turning to the person by his side and going "More tea, Lamport?"

The whole premise of S-W, hence the film, is that language constrains your thinking: new languages enable new thoughts. That's very true in computing languages; you do think of solutions to problems in different ways, once you fully grasp the tenets of language like Lisp and Prolog. In human language: less clear. It certainly exposes you to a culture, and what that culture values (hint: there is no single word for Trainspotting in Italian, nor an english equivalent of Passiagata). And the S-W work was based on the different notions of time in Hopi, plus that "13 words for snow" story which implies the Inuit see snow differently from the rest of us. Bad news there: take up Scottish Winter Mountaineering and you not only end up with lots of words for Snow (snow, hail, slush, hardpack, softpack, perennial snowfield, ET-met snow, MF-met snow, powder, rime, corniche, verglas, sastrugi, ...), you end up with more words for rain. Does knowing the word Dreich make you appreciate it more? No, just that you have more of a scale of miserable.

Chomsky argued the notion of language comprehension being hardwired into our brain, the Front Temporal Lobe being the convention. Based on my own experiments, I'm confident that the location of my transient parser failures was separate from where numbers come from, so I'm kind of aligned with him here. After all: we haven't had a good conversation with a dolphin yet, and only once we can do that could we begin to make the case for what'd happen if we met other sentient life forms.

To summarise: while enjoying the lovely cinematography and abstract nature of the film, I couldn't sit there in disbelief about the language theme while wondering why they weren't asking the interesting questions, like The Halting Problem, whether P = NP, or even more fundamental: does maths exist, or is it something we've just made up?

Maybe that'll be the sequel.

Further reading

[Alford80] Demise of the Whorf Hypothesis.

(*) This had made me realise, I shoud add Chomsky to the list of CS grandees I should seek to be gently chided by, now having ticked Milner, Gray and Lamport off the list

(picture: 3Dom on Moon Lane)


Moving Abroad

Earlier this year I moved to a different country.

Whenever I think I've got accustomed to this country's differences, something happens. A minister proposes having companies publish lists of the numbers of non-British employees working on them. A newspaper denounce judges as Enemies of the People for having the audacity to rule that parliament must have a vote on government actions which remove rights from its citizens.  And then you realise: it's only just begun.
Boris meets Trump at Westmoreland House
A large proportion of the American population have just moved to the same country. For which I can issue a guarded "hello". I would say "welcome", except the country we've all moved to doesn't welcome outsiders —it views them with suspicion, especially if they are seen as different in any way. Language, religion and skin tone are the usual markers of "difference", but not sharing the same fear and hatred of others highlights you as a threat.

Because we have all moved from an apparently civilised country to one where it turns out half the people are the pitchfork waving barbarians who are happy to burn their opponents. That while we thought that humanity had put behind them the rallies for "the glorious leader" who blamed all their problems on the outsider —be it The Migrant, the Muslim, The Jew, The Mexican or some other folk demon, we hadn't; we'd just been waiting for glorious leaders that looked slightly better on colour TV.

Bristol Paintwork

One thing I've seen in the UK is that whenever something surfaces which shows how much of a trainwreck things will be (collapse in exchange rates, banks planning to move), the brexit advocates are unable to recognise or accept that they've made a mistake. Instead they blame: "the remainers", the press "talking down the country", the civil service "secretly working against brexit", the judicial system (same); disloyal companies. Being pro-EU is becoming as much a crime as being from the EU.

That's not going to go away: it's only gong to amplify as the consequences of brexit become apparent. Every time the Grand Plan stumbles, when bad news reaches the screens, someone will be needed to take the blame. And I know who it's going to be here in England —troublemakers like me.
We're sitting through a history book era. And not in a good way.

If there's one change from the past, forty years from now, PhD students studying the events, "the end of the consensus", "the burning of the elites", "the rise of the idiotocracies", or whatever it is called, they'll be using Facebook post archives and a snapshot of the twitter firehose dataset to model society. That is: unless people haven't gone back and deleted their posts/tweets to avoid being recorded as Enemies of the State


ps: Happy Kristallnacht/Berliner Mauer Tag to all! If you are thinking of something to watch on television tonight, consider: The Lives of Others


What shall we do with the Europeans in our midst?

Most of my family members are European. There: I said it. I have two German uncles. And a wife born in Nairobi, a son born in Oregon, a mother in Glasgow, a father in Ulster —a father who spent the last 20 years of his life living in France. I was born in Scotland, grew up in London, now living in Bristol. When I exercise my inherited right to an Irish passport, I shall officially remain an EU citizen, regardless of what happens in Britain.

We are all Europeans; a continent whose history of warfare is abysmal compared to the Chinese empire (before the UK started the Opium wars), Feudal Japan (before the US turned up and demanded access at gunpoint), North America (before the European colonists decided they wanted most of the land, and pretty much everywhere else. The reason Europe embraced guns, while China used gunpowder for fireworks is that one place was a stable area who liked to party, the other place a mess which liked to kill neighbours on account of: different religion. Different interpretation of the same religion. Speaking different language. Differences in which individual was considered ruler of the area.

The post-1945 EU project was an attempt to address this, by removing the barriers, boosting trade and mutual industry, making visiting the other countries easy, and making it easy to live in the other countries. Why does the latter matter? It was aimed at preventing the mass unemployment scenarios of the 1930s from developing again —or at least spreading the pain, so one country didn't get trapped in a downward economic spiral: the people weren't going to be trapped, awaiting a Glorious Leader to rescue them. Instead they could follow the jobs.

Britain 2016 is not Germany 1934.

We haven't burned the books yet, though I wonder how long before the list of forbidden web sites starts to be a mandatory feature of home broadband links, rather than an optional item

And, we are a long, long way from the memorials to the people killed by an oppressive state
Berlin Buzzwords

(though I note we don't own up to our history in Slavery; there's one slow-motion holocaust we pretend is nothing to do with us).

But: the hate is beginning, and I fear where it will lead.

Right now, if someone was asking for advice as to where in Europe to set up a small software startup, to hang round with like-minded hackers, to enjoy a lifestyle of which coding is in integral part, I'm going to say: Not London. Not Bristol. Berlin


Britain has a party of hate in power. In four months we've gone from a referendum to being in/out the EU project, to one where politicians are proposing companies provide lists of who is "foreign"; where the Prime Minister are saying the National Health Service only needs staff and doctors from the rest of the continent "for an interim period".

All of a sudden we've gone from being one of the most diverse and welcoming countries in the continent to one where there's already an implicit message "don't buy a house here —you won't be staying that long."


And what is the nominal opposition party doing? Are they standing up and condemning such atrocious nativist xenophobia? No, they are too busy in their internal bickering to look around, and when they do say things, its almost going along with it, believing they need to appear to be harsh on immigration, accepting of the Brexit referendum —as if that is needed for power, and that their getting re-elected is more important than preventing what the conservative party is trying to do. Where are the protests? The "we are all European" demonstrations? Because I'd be there. As it is, we have Nicola Sturgeon of the Scottish Nationalist Party being the sole mainstream politician to denounce this.

Berlin Buzzwords

Does anyone really think things would stop at collecting lists of "foreigners"? Or will that just legitimise the growing number of racist hate crimes which have started up after the Brexit referendum; crimes that have got as far as murder. It is only going to make things worse, and in the absence of a functional opposition, there is nothing to stop this.

I don't know how my friends and colleagues from the rest of Europe feel about this —I haven't spoken to any this morning. But I know this: I don't feel at home in this country any more.


s/macOS Sierra/r/macOS vista/

I've been using macOS sierra for about ten/eleven days now. and I've rebooted my laptop about 6+ times because the system was broken.

Two recurrent problems: failure to wake in the morning, gradual lockup of finder and transitive app failure.

Failure to wake: I go up to the laptop, hit the keyboard and mouse, nothing happens. Only way to fix: hold down the power and wait for a hard restart.

Back in 1999 I worked on a project with HP's laptop group, where we instrumented a set of colleague's laptops with a simple data collection app, then collected a few months of data. At the time this was considered "a lot of data". The result, the paper: The Secret Life of Notebooks. This showed that people tended to have a limited set of contexts, where context was defined as system setup: power, display, IP addr, and application: mail, ppt. And that people were so predictable in their use models, that doing some HMM to predict contexts wouldn't have been hard.

I ended up writing some little app which essentially did that: based on IPAddr and app (PPT, Acroread) full screen, could choose: power policy, network proxy options, sound settings (mute in meetings, etc). It was fairly popular amongst colleagues, because it would turn proxy stuff on and off for you, and know to turn off display timeouts when giving presentations; crank up the savings when on the move. When I look at Windows 8+ adaptation to network settings, or OSX's equivalent of that and the "When on battery ...", I see the same ideas. You don't get any HMM on the laptops though; for that you have to pick up an android phone and look at Google Now, something which really is trying to understand you and your habits. And, because it can read your emails, correlate those habits with emailed plans. If it really wanted to be innovative/scary it would work out who you were associated with (family, friends, colleagues, fellow students...) and use their actions to predict yours. Maybe someday.

User-wise, another interesting feature was how people viewed mail so differently when online vs offline. Offline, you'd see this workflow of outlook-> word-> outlook-> ppt-> outlook-> acroread-> outlook, ... etc, very fast cycles. It seemed uncontrolled window tabbing at first —until you realise it's peple going through attachments. Online, and people's workflow pulled in IE (it was a long time ago), and you'd get a cycle where IE was the most popular app cycled to from outlook. Email was already so full of links that the notion of reading email "offline" was dying. You could do it, but your workflow would be crippled. And that was 15+ years ago. Nowadays things would only be worse.

There was a second paper which was internal, plus circulated/presented to Microsoft. There I looked at system uptime, and especially the common sequence in the log

1998-08-23 18:15 event: hibernate
1998-08-24 09:00 event: boot


1998-09-01 11:20 event: suspend
1998-09-01 11:30 event: boot

That is: a lot of the time the laptop through it was going to sleep, it was really crashing.

My theory was that alongside the official ACPI sleep states S1-S5 there was actually a secret state S6, "the sleep you never awake from". Some more research showed that it was generally on startup that the process failed, and it was strongly correlated with external state changes: networks, power, monitors. It wasn't that the laptop made a mess of suspending, it was that when it came back up it couldn't cope with the changed state.

I don't know if macOS sierra has that issue: I do know that it has that problem if left attached to an external display overnight. Looking in the system logs, you can see powernap wakeups regularly (that's all displays off), but come the first user interaction event —where the displays are meant to kick off— they don't come up. This is resulting in system logs not far off from the '99 experiment

2016-09-27 22:20 powerd: suspend
2016-09-27 23:00 powerd: powernap wake

2016-09-27 23:30 powerd: powernap wake

2016-09-28 00:30 powerd: powernap wake
2016-09-28 08:30 powerd: powernap wake
1998-09-01 09:30 event: boot

That last one: that's me trying to use it.

I've turned off powernap to see if that makes a difference there.

That's the nightly problem. What's happened 3+ times is the lockup of Finder, with a gradual degradation of other applications as they go near its services.

First finder goes, and restarts do nothing
Finder not responding
Then the other apps fail, usually when you go near the filesytem dialogs, or the photo collection.
Safari not responding
As with finder, restart does nothing.

If it was my own code, I'd assume a lock is being acquired in the kernel on some filesystem resource and never being released. This is why locks should always have leases. Root cause of that lock/release problem? Who knows. I can't help wondering, though, if its related to all the new icloud sync feature, as that's the biggest filesystem change. I've also noticed that I usually have a USB stick plugged in; I'm going to go without that to see if it helps.

When i get this slow failure, i don't rush to reboot. It takes about 10 minutes to get my dev environment back up and running again: the IDEs, the terminal windows, etc, 2FA signing in to webapps, etc. I really don't want to have to do it. Instead I end up with bits of the UI keeling over, while I stick to the IDE, chrome, terminals. I had a bit of problem on Thursday evening when calendar locked up the extent I couldn't get the URLs for some conf calls; I had to use the phone to get the links and type them in.

Anyway, come the evening, after the conf calls and some S3a Scale tests, I kick off a shutdown.

And here a flaw of the OSX UI comes in: it assumes that whatever reason you are trying to do a shutdown for, it is not because finder has crashed. And it gives any application the right to veto the shutdown. You can't just select "shut down..." on the menu, you have to wait for any apps to block it, stop them and then continue. And even after doing all of that, I come in this morning and find the laptop, fans spinning away, me logged out but some dialog box about keychain access required. This is not shutting down, this is half hearted attempt at maybe shutting down sometimes if your OS hasn't got into a mess.

It's notable that Windows has some hard coded assumptions that a shutdown is caused by the failure of something. It also has, from the world of Windows Server, the concept that the user may not be waiting at the console waiting to click OK dialogs popped up by apps. Thus it has a harsher workflow.

  1. A WM_QUERYENDSESSION message comes out saying "we'd like to shut down, is that OK? Apps get the opportunity to veto the sesson end, but not if it's tagged as a critical shutdown. And of you don't service that event, you are considered dead and don't get a veto.
  2. The WM_ENDSESSION event sent to apps to say "you really are going down —get over it".
  3. There is a registry entry WaitToKillAppTimeout you can use to control how long the OS waits for applications and to terminate, WaitToKillServiceTimeout for services, and even HungAppTimeout to control how long an app has to respond to an exit menu request (WM_EXIT?) before being considered dead and so killed.
See? Microsoft know that things hang, that even services can hang, and that if you want to shut down then you want to shut down, not find out 12 hours later that it had stopped with a dialog box.

In contrast macOS Sierra has implicit assumptions that apps rarely hang, the OS services never deadlock, and that shutting down is a rare activity where you are happy to wait for all the applications to interact with you —even the ones that have stopped responding.

This may have held for for OS/X, but for macOS all those assumptions are invalid. And that makes shutdown far more painful and unreliable than it need be.

Now if you go low level and do a "man shutdown", you can see that a similar escalation process is built in there

Upon shutdown, all running processes are sent a SIGTERM followed by a SIGKILL.  The SIGKILL will follow the SIGTERM by an intentionally indeterminate period of time.  Programs are expected to take only enough time to flush all dirty data and exit.  Developers are encouraged to file a bug with the OS vendor, should they encounter an issue with this functionality.

I think from now on, it'll be a shutdown command from the console.

Anyway, because of all these problems, I do currently regret installing macOS sierra. It shipped to meet a deadline, rather than because it was ready.

macOS Sierra is not ready for use unless you are prepared to reboot every day, and are aware that the only way to reboot reliably is from the console.


macOS Sierra and the rediscovered world of desktop agents

I sort of accidentally upgraded to macOS Sierra on the day it came out.

It wasn't that I'd be awaiting it, holding my breath until it was announced to a shrieking audience. Rather, my laptop had been complaining for about 3+ weeks that there were critical security patches to apply, patches which needed machine reboots. I was now two patches behind. Firefox had just updated to 49.0 and wanted a reboot, and Chrome was saying "chrome is old". With both browsers needing restarting, I may as well take the hit for the OS patches too, so went to the App Store for an update. And what should the front page have but a "macOS Sierra" banner. So I went for it.

Installation: slow. I'll take "29 minutes remaining" as the macOS equivalent of Vista's "Step 3 of 3 100% complete" message: a warning that the next operation's completion time is a Halting Problem kind of estimate.

I went out into the sunlight to read about Scala collections, and after doing that long enough to get a headache checked back upstairs again "29 minutes remaining", so went to the household 2009 imac (not upgradeable) and did some collaborative editing on google docs. Which is why google docs, for all its lack of offline-ness, is such a good tool. If you have a browser with a network connection, you have a word processor.

After it's there, what do you see?
  • Cuter animations
  • Sound in the UI. I don't like sound; there is a way to turn beeps off.
  • A new look notification bar
  • messaging app that looks more like the IOs one.
  • The ability to turn on auto delete of backups trash
  • The option to have all your docs backed up to iCloud
  • Safari doesn't tell web sites when you have Flash, silverlight or Java installed, makes you go through some minor hoops to run the plugins.
  • Siri

  • IntelliJ IDEA scrolling (really, all Java Swing apps and scrolling). Very sensitive.
  • Keyboard remapping with Karabiner. I need this as I have a UK keyboard plugged into a US-keyboarded Mac. Even with Karabiner elements things aren't  right —I fear another reboot is coming. 
What don't you get
  • TIme Machine complaints about backups haven't moved to the notification API. This means I can't configure them not to keep telling me off. Probably a sign that Time Machine isn't getting attention, at least not in its current form: presumably they are working on the Apple File System equivalent, so ignoring this one. 
  • Any tangible improvements in the UI of mail
  • Any tangible improvements in the UI of OS Calendar. I could go on an extended diatribe against that at some point, —like why is when I decline an event it stays in the calendar and I have to actually delete it, after which it asks me if I really really want to do that. It's as if the apple team want me to attend meetings. Or how the event details window is tall and thin, which is the wrong shape for wide screens —and is utterly the wrong shape to display URLs and dial in details for conference calls. I think Conway's law is telling me about Apple's teamwork model there. Meetings you can't duck out of; a focus on F2F meets with a short roomname "dogcow upper" rather than a url to a webex event.
  • APFS being production ready. I may reformat my external SSD drive (The one I keep VBox images on), to try it. Not sure I'd gain much from it though. Perhaps I'll start with a 64MB SSD card.

The safari flash/java changes are wonderful. Why?

The lack of header information is good on its own
  1. Stops brokered malware adds bidding to submit adverts to users in europe-but-not-russia with a specific version of flash installed.
  2. Makes it harder to distinguish users uniquely by their plugins. There's still probably fonts and things, but at least one information point is gone
As for the hoops, as well as defending from malware, it provides extra motivation for everyone to move off flash. That includes BBC news, which happily serves up HTML 5 videos to IOS devices, yet complains about flash on the desktop. They are going to have to change that policy fast —which is great for those of us who are trying to move to a 0-flash desktop for household infosec reasons alone.

Now, Siri
I asked it where my wife Bina was. Instead it tried to arrange a marriage between myself and a colleague

Siri arranges a marriage for me

I did work out the root cause here: she's in my google contacts but not exchange ones, and OSX contacts is hooked up to exchange contacts only. I added a record to that contact list and then I could bond. Funny though.

I've tried a few other commands, some work, some don't. A problem here is that I like to work with background music, I don't feed it out from the laptop; got an old ipod plugged into an amplifier across the room. I'd need to embrace itunes and either choppy-during-builds bluetooth or the long wire to have siri damping down the music when I talk to it.

I think the thing about Siri is you have to consider the user base of most people who use macOS: they don't use terminals. their interaction is via the finder, perhaps spotlight. In this context, Siri is An Agent, in the terminology of Yoav Shoam; you say something, it gets translated (Modern ML algorithms), then that is used to generate The Plan, which is then executed. I'd assume that final execution could be done with AppleScript: instrument your app with state queries and operators, and perhaps Siri could work on it.

Funnily enough, Agents with distributed computation were actually the first thing I worked on at HP Labs, the sole trace of which appears to be some citations, some locally scanned documents of mine, and a printing of a usenet post.There it was typing, not talking, the little Windows/386 / HP NewWave agent sent the message to the Prolog interpreter running on a workstation (TCP? I forget. If so, probably the first time I ever encountered socket error codes). We were actually trying to do multimodal stuff: you could point to something and then ask for actions. I got up to demo state, but, relying on Prolog parsing, it wasn't real natural language —no more than SQL is. Hopefully, Siri does a better job of it. And they could go multimodal once Siri goes into continuous listening: select something in finder and ask siri to act on it.

Given that past work then, I guess I should give it a go


Scalatest: thoughts and ideas

Spark uses Scalatest for its testing.

I alternate between "really liking this" and "bemoaning aspects of it".

Overall: I think the test language is better than JUnit, especially once you embrace more of the scala language. But the test execution could be improved, especially maven integration, and, to a lesser extent, that test reporting.

France 2016



  eventually(timeout, policy) { closure };

This does a busy wait for the closure to complete (i.e. not throw an exception), policies include exponential backoff.

Pro: easy to write clauses in a test case which block until a state is reached or things timeout

Con: there's no way by default for those clauses to declare that they will never complete. That happens, especially if you are waiting for some state such as the number of events to be exactly 3 in a queue. If the queue size goes to 4, it's never going to drop to 3, so the clause could fail fast.

I think the fix there would be to define a specific "unrecoverable assertion failure" exception and raise that, have "eventually()" not swallow them.

the === check,

 assert(3 === events.queueSize)

This does good reporting on what the values are on an equality failure. It does need rigorousness about the order; I follow the junit assertEquals policy of expected-value-first. Projects really need a style guide/policy here.

Weakness: no typechecking at compile time. Better get those types right. Though as your tests should run through all assertions unless it is somehow nondeterministic, you'll find out soon enough.


  val ex = intercept[IOException.class] {
  assert(ex.contains("host"), s"no 'host' in $ex")

Here there's a straightforward catch of a specific type (and failure raised if it didn't happen). As the exception is returned, the test case can continue. And with string formatting with the s"" syntax, reasonable groovy-esque diagnostics.

Matchers and the declarative specification syntax

  update("submissions") should be(5)

Ignoring the fact that the RFC2517 means that the syntax ought to be "must" and not "should", this style is fairly similar to JUnit 4 matchers, which is not something I ever do much of.

Pro: looks nicer than assert() clauses
weaknesses: you can add more details in assert() clauses —provided you sit down and do it.

Could try harder


the before and after mechanisms:, beforeclass and afterclass

While seems and easy way to declare things to run before tests, it doesn't work well with traits/mixins. In contrast, having setup() and teardown() methods, combined with Scala's strict order of evaluating traits and what what "super.setup()" would mean in a trait, allows you to write tests with traits where different traits can execute their setup/teardown clauses in a deterministic manner. Note that JUnit's before/after stuff is also a bit ambiguous, so it's not a denunciation of Scalatest here —more a commentary of the setup problem.

ignoring tests.

I've ended up doing this in the tests for spark-cloud, writing a conditional ctest() function. The secret there is to know that the test(name) { closure } declaration is not declaring a method, as in JUnit, but simply declaring an invocation of the test() method —an invocation which takes place during the construction of an instance of the test suite. There is nothing to stop you wrapping that call in other mechanisms: extra conditions, dynamic parameters, etc. It's only a closure, you understand.

Maven integration

Needs better wildcard and test suite declarations. I've only just discovered the ability to include a string in the name of the test case to run:

mvt -Dtest=none '-Dsuites=org.apache.spark.deploy.history.HistoryServerSuite incomplete' ; say moo

This command
  • doesn't run any java tests (because -D test=none doesn't match anything)
  • runs every test in HistoryServerSuite which has the world "incomplete" in the title
  • says "moo" after, so I can get on with other things while scalac gets round to building things, scalatest to running them. This matters because Scalac is so, so slow.


The test reporter "helpfully" strips off the stack trace on assertion failures, so you don't get the full stack. This seems a good idea, except in the special case "you have the assertions inside a method which your test cases call". Do that and you suddenly lose the ability to detect what's wrong because the stack trace has been stripped —all you get is the name of the utility method. Just give us the full stack and stop trying to be clever..

My new conditional test runner

Here, then , is my ctest() function to declare a test which is conditional on the state of the entire suite, and an optional extra per-test predicate

* A conditional test which is only executed when the
* suite is enabled and the `extraCondition`
* predicate holds.
* @param name test name
* @param detail detailed text for reports
* @param extraCondition extra predicate which may be
* evaluated to decide if a test can run.
* @param testFun function to execute
protected def ctest(
  name: String,
  detail: String = "",
  extraCondition: => Boolean = true)
  (testFun: => Unit): Unit = {
  if (enabled && extraCondition) {
    registerTest(name) {
  } else {
    registerIgnoredTest(name) {

What is critical to know is that this ctest() function, is not declaring any function/method in the test suite. It's just a function invoked in the suite's constructor,  interpreted in the order of listing in the file, expected to register the associated closure for invocation.

The normal test() function registers the test; the new ctest() function takes some extra detail text (logged @ info, and when I do something about reporting, may make it to the HTML), a condition for optional execution and that closure. There's also a suite-wide enabled() predicate. This is something which can be subclassed or mixed-in through a tratit for configurable execution. As an example. there are object store tests for both S3A and Azure, which are only enabled if the relevant credentials have been supplied.

Here's a more complex example

private[cloud] class S3aLineCountSuite
  extends CloudSuite with S3aTestSetup {


  override def useCSVEndpoint: Boolean = true

  def init(): Unit = {
    if (enabled) {

  override def enabled: Boolean = super.enabled

     && hasCSVTestFile

    "Execute the S3ALineCount example") {
    val sparkConf = newSparkConf(CSV_TESTFILE.get)
    assert(0 === S3LineCount.action(sparkConf, Seq()))

The enabled predicate is true if the S3aTestSetup mixin was happy (that is, I've added credentials), and a CSV test file is defined. Then we run an example application, verifying
its exit code is 0.

Do I like this? Mostly.

Things I'm yet to explore

Async testing:

How to get the most out of scalatest 

(at least as far as I've learned so far)

  • Keep a strict ordering of (expected === actual). I use the same ordering as JUnit —I hope others are consistent there, as it's not so obvious in a test failure that things are different.
  • Add extra details in assert() clauses.
  • Embrace eventually()
  • Give every test case in a test suite a string which is at least partially unique. That way, you can run an individual test with the -Dsuite mechanism.
  • Try to make them fast. This really matters on spark, because there are a lot of tests to run.
  • If you are going near traits, have setup/teardown methods tagged as before {} and after{}, then use them through the traits consistently
  • Don't go too overboard in traits, even if you spent enough time in C++ to be perfectly comfortable with mixins and expect the maintainers to be happy with that too. As you can't make that latter expectation: use carefully.
  • Do try all the different spec mechanisms, rather than just stick with one because "it most reminds you of what you used to use". You could be missing out there
  • However: don't try all the different spec mechanisms wildly. Do try to be consistent with the rest of the project
Most of all: read the underlying source code. It's there. Look at what test(name) { clause } does and follow it through. See how tests are executed. Think about how to take best advantage of that in your test suites.


Gardening the commons

It's been a warm-but-damp summer in Bristol, and the vegetation in the local woods has been growing well. This means the bramble and hawthorn branches have been making their way out of the undergrowth and into the light —more specifically the light available in the mountain bike trails.

Being as both these plant's branches have spiky bits on them, the fact that they are growing onto the trails hurts, especially if you are trying to get round corners fast. And, if anyone is going round the trail without sunglasses they run a risk of getting hurt in/near the eye.

I do always wear sunglasses, but the limitations on taking the fast line though the trails hurts, and as there a lot of families out right now, I don't want the kids to get too scraped.

So on Saturday morning, much to the amazement of my wife, I picked up the gardening shears. Not to do anything in our garden though —to take to the woods with me.


  1. Those Kevlar backed gloves that the Clevedon police like are OK on the outside for working with spiky vegetation, but the fingertips are vulnerable.
  2. Gardening gets boring fast.
  3. When gardening an MTB trail, look towards the direction oncoming riders will take.
  4. A lot of people walking dogs get lost and ask for directions back to the car park.
  5. Someone had already gardened about 1/3 of the Yertiz trail. (pronounciation based on: "yeah-it-is"
  6. Nobody appreciates your work.
I appreciate the outcome my own work, I can now go round at speed, only picking up scrapes on the forearms on the third of the trail that nobody has trimmed yet. I actually cut back on the inside of the corners there for less damage on the racing line, while cutting the outside and face height bits for the families. Now they can have more fun on the weekends, I can do my fast work weekday lunchtimes.

They can live their lives with fewer wailing children, and I've partially achieved my goal of less blood emitted per iteration. I'll probably finish it off with the final 1/3 at some point, maybe mid-august, as I can't rely on anyone else.

There are no trail pixies but but what we make

Alsea Trail Pixie sighting

Which brings me to OSS projects, especially patches and bug reports in Hadoop.

I really hate it when my patches get completely ignored, big or small.

Take YARN-679 for example. Wrap up piece of YARN-117, code has celebrated its third birthday. Number of reviewers. Zero. I know its a big patch, but it's designed to produce a generic entry point for YARN services with reflection based loading of config files (you can ask for HiveConfig and HBaseConfig too, see), interrupt handling which even remembers if its been called once, so on the second control-C bypasses the shutdown hooks (assumption: they've blocked on some IPC-retry chatter with a now-absent NN), and bail out fast. Everything designed to support exit codes, subclass points for testability. This should be the entry point for every single YARN service, and it hasn't had a single comment by anyone. How does that make me feel. Utterly Ignored —even by colleagues. I do, at least, understand what else they are working on...it's not like there is a room full of patch reviewers saying "there are no patches to review —let's just go home early". All the people with the ability to review the patches have so many commitments of their own, that the time to review they can allocate is called "weekends".

And as a result, I have a list of patches awaiting review and commit, patches which are not only big diffs, they are trivial ones which fix things like NPEs in reporting errors returned over RPC. That's a 3KB patch, reaching the age where, at least with my own child, we were looking at nursery schools. Nothing strategic, something useful when things go wrong. Ignored.

That's what really frustrates me: small problems, small patches, nobody looks at it.

And I'm as guilty as the rest. We provide feedback on some patch-in-progress, then get distracted and never do the final housekeeping. I feel bad here, precisely because I understand the frustration.

Alongside "old, neglected patches", there are "old, neglected bugs". Take, for an example HADOOP-3733 "s3:" URLs break when Secret Key contains a slash, even if encoded. Stuart Sierra gave a good view of the history from his perspective.

The bug was filed in 2008
  1. it was utterly ignored
  2. Lots of people said they had the same problem
  3. Various hadoop developers said "Cannot reproduce"
  4. It was eventually fixed on 2016-06-16 with a patch by one stevel@apache.
  5. On 2016-06-16 cnauroth@apache filed HADOOP-13287 saying "TestS3ACredentials#testInstantiateFromURL fails if AWS secret key contains '+'.".
  • The Hadoop developers neglect things
  • if we'd fixed things earlier, similar problems won't arise.

I mostly concur. Especially in the S3 support, where historically the only FTEs working on it were Amazon, and they have their own codebase. In ASF Hadoop, Tom White started the code, and it's slowly evolved, but it's generally been left to various end users to work on.

Patch submission is also complicated by the fact that for security reasons, Jenkins doesn't test the stuff. We've had enough problems of people under-testing their patches here that there is a strictly enforced policy of "tell us which infrastructure you tested against". The calling out of "name the endpoint" turns out to be far better at triggering honest responses than "say that you tested this". And yes, we are just as strict with our colleagues. A full test run of the hadoop-aws module takes 10-15 minutes, much better than the 30 minutes it used to take, but still means that any review of a patch is time consuming.

I would normally estimate the time to review an S3 patch to take 1-2 hours. And, until a few of us sat down to work on S3A functionality and performance against Hive and Spark, those 1-2 hours were going weekend time only. Which is why I didn't do enough reviewing.

Returning to the S3 "/", problem
  1. This whole thing was related to AWS-generated secrets. Those of us whose AWS secrets didn't have a "/" in this couldn't replicate the problem. Thus it was a configuration-space issue rather than something visible to all.
  2. There was a straightforward workaround, "generate new credentials", so it wasn't a blocker.
  3. That related issue, HADOOP-13287, is actually highlighting a regression caused by the fix for HADOOP-3733. In the process for allowing URLs to contain "/" symbols, we managed to break the ability to use "+" in them. 
  4. The regression was caught because the HADOOP-3733 patch included some tests which played with the tester's real credentials. Fun problem: writing tests to do insecure things which don't leak secrets in logs and assert statements.
  5. HADOOP-13287 is not an example of "there are nearby problems" so much as "every bug fix moves the bug", something noted in Brook's "the mythical man month" in his coverage of IBM OS patches.
  6. And again, this is a c-space problem, it was caught because Chris had + in his secret.
Finally, and this is the reason why it didn't surface with many of us, even though we had "/" in the secret is because the problem only arises if you put your AWS secrets in the URL itself, as s3a://AWSID:secret-aws-key@bucket

That is: if your filesystem URI contains secrets, which, if leaked —threaten the privacy and integrity of your data and is at risk of running up large bills, then, if the secret has a "/", the URL doesn't authenticate.

This is not actually an action I would recommend. Why? Because throughout the Hadoop codebase we assume that filesystem URIs do not contain secrets. They get logged, they get included in error messages, they make their way into stack traces that can go into bug reports. AWS credentials are too important to be sticking in URLs.

Once I realised people were doing this, I did put aside a morning to fix things. Not so much fixing the encoding of  "/" in the secrets (and accidentally breaking the encoding of "+" in the process), but:
  1. Pulling out the auth logic for s3, s3n and s3a into a new class, S3xLoginHelper.
  2. Having that code strip out user:pass from the FS URL before the S3 filesystems pass it up to their superclass.
  3. Doing test runs and seeing if that is sufficient to keep those secret out the logs (it isn't).
  4. Having S3xLoginHelper print a warning whenever somebody tries to use secrets in URLs.
  5. Edit the S3 documentation file to tell people not to do this —and warning the feature may be removed.
  6. Edit the Hadoop S3 wiki page telling people not to do this.
  7. Finally: fix the encoding for /, adding tests
  8. Later, fix the test for +
That's not just an idle "may be removed" threat. In HADOOP-13252, you can declare which AWS credential providers to support in S3A, be it your own, conf-file, env var, IAM, and others. If you start doing this, your ability to embed secrets in s3a URLs goes away. Assumption: if people know what they are doing, they shouldn't be doing things so insecure.

Anyway, I'm glad my effort fixing the issue is appreciated. I also share everyone's frustration with neglected patches, as it wastes my effort and leaves the bugs unfixed, features ignored.

We need another bug bash. And I need to give that final third of the Yertiz trail a trim.


Travels abroad

Early in July I did a bike ride with many other parents from Alexander's primary school; bikes loaded into a truck and driven down to Bergerac, us flying out to the airport, then riding south at 60-100 miles/day, ending up in San Sebastian, Spain, skirting the pyrenees carefully.

Here's me, at the Atlantic coast, June 2006.


This was the first time I'd been at the French/Spanish border of Hendaye/Irun for 26 years. Then we loaded the bikes onto a train in London, then three days later a train+ferry+overnight train to get to Hendaye station, again getting on our bikes at the Atlantic seaside —this time, August 1990.


As usual, not the combined weight of rider+bike+luggage is constant, even though now the bike is a Ti+Carbon CX machine, and then it was a steel MTB with panniers full of camping equipment and a change of clothes. (*)

Two weeks later, after zig-zagging up the highest roads in the Pyrenees, wobbling in and out of France and Span, we got to the Mediterranean Sea, somewhat the worse for wear.

Return journey: similar, preload the bikes, overnight train to paris, onboard to London, pickup the bikes and then further trains west. 24+ hours, at least.

In 1990, that was a major expedition. I got a credit card for the first time, changed money into FFr and Spanish Pesetas, took some travellers checks to change money as went down. Maps: paper things hooked onto the handlebars. Camera: my first 35mm autofocus Canon compact camera. It had a little self timer, hence the various staged team-selfie pics. The bikes were generations behind what I have now, as were: The lights, the camping kit, much of the clothing. Communications: postcards.

This time, it's not a expedition, it's just a week on the mainland. Money? Euros left over from my June trip to Berlin, topping up at ATMs. Camera? on the phone plus a compact digital camera with proper zoom lens. Maps: Garmin GPS and offline google maps. Communications: phone with free phone and data roaming round Europe. Luggage: cards, a bag full of chargers and miscellaneous electronics. Riders: much, much worse. If I'd had Strava then, today's numbers would be so bad I'd give up in despair.

And now: not a multi-train expedition, a quick holiday from the local airport, followed by a few days in San Sebastian and Bilbao, back home from another direct flight, home in two hours, then driving over to Portsmouth to collect a child: a low effort, low stress day. It took as long to drive to Portsmouth as it did from our Bilbao hotel to our Bristol house.

This was not a trip to foreign lands where money is different, where a passport is needed to be handy as you cross the borders, where things are exotic. We are in Europe now, and have moved on from the neighbouring countries being far, far, away.

Except now: things have changed. The rate of decay of the UKP:EUR exchange rate meant that we had to run to the ATMs, fearing that the rate would be worse if waited 24h. Where today, we could roam freely as part of a continent, the decisions of fellow UK voters means that we're taking a step backwards as a nation.

While that exoticness 26 years ago made for more of a wilder expedition, we have gained so much. Yet now Britain is turning its back on it in exchange for a social and economic fiasco. It's going to be a disaster, and when that happens, the people to blame: the paper proprietors, the politicians who lied —they'll get away with it. The country will suffer, worse of all those people in the forgotten towns who believed the lies. Well: they voted for it —their fault. If need be, I can walk away from the country, over to the continent which I consider myself a full citizen of.

(*) We had a regime of wash the previous day's clothes in the morning, hang to dry on the bike, wear the fresh stuff in the evening and again on the following day's ride. It worked, though on dusty days things got a bit gritty.


A Small Divided Island on the Edge of Europe

Berlin 2016

One of the things we got to do we moved back from the US to the UK was to introduce our 3 year old son to his inheritance: Europe. The first trip: a flight to Nice, stay in Antibes, the town on the Cote D'Azur where my father lived for the last 30+ years of his life, where I used to go for holidays in my late teens, my first experience of spending time in France. We drove to Grenoble, stayed with some UK friends who had moved to work there, attended a wedding between an Italian friend and former colleague, now also resident in Grenoble, and UK Army officer, stationed in Germany.

Later that year, the full road trip: Eurotunnel to France, overnighting in Namur, Belgium. Then down through Luxembourg and on to Germany, ending up south of Munich -two countries a day. Staying with an Irish & German couple, again, friends from Bristol. Introducing Alexander to the German lifestyle, heading into Austria for his first Alpine hut experience. Other holidays, other places: Italy, Berlin, Amsterdam. He, like us, had the whole of Europe waiting for him when he grew up.

And now? The UK has voted to put a wall between itself and the rest of the continent. A lot of the voters think they will are putting a wall up to keep people out. Maybe but the consequence is a wall keep the children in.

For people struggling to survive in the bits of the country where the heavy industry used to be, that's not exactly something that matters to them. Competition for minimum wage, zero hour jobs, the repeated message that "it's all the fault of Europe", you can't fault them. When your life is fucked -you've got nothing to lose. Unfortunately for them: this isn't going to solve their problems. They were lied to by politicians who knew they were being dishonest and by papers that told them Brexit was the answer. Wherein lies a danger: if a manifesto of xenophobia and cutting yourself from your trading partners doesn't deliver, what's the next policy action going to be? My fear: ramping up the hate.

It doesn't bode well for the UK, and it's a warning for the US: hate, fear and unrealistic promises can attract to voters who feel forgotten by what appears a distant political group unconcerned with the problems of vast swathes of a country. 

B. and I are off to France & Spain for a cycling holiday next Saturday. Our son will be staying with his (Scottish) grandmother in Portsmouth, on the South Coast of England. Where before he could look out from the seashore and see a continent waiting for him, now the only thing he'll see waiting for him out there is the Isle of Wight.

[Photo: Berlin Wall, June 2016]


Toxic Ideas

Whenever I walk down to the Pop-up Patisserie on Stokes Croft to buy my Pain-aux-Amandes for drinking with my morning Illy Coffee, I cross a lovely little square on the King's Down, above Nine Tree Hill,


This lovely square is actually a war grave. A thirteen gun fort was built there during the English Civil War, a fort captured by the Royalists in 1643. in 1645 it fell to Cromwell's New Model Army —and everyone in the fort killed.

Looking at the genteel civility of these georgian houses built 150 years after the war, it's hard to conceive of what happened here, but it did.

That is: by the end of the Civil War, people in the country side were prepared to kill fellow citizens "without quarter". No expansionist battle for territory between cultures, language speakers, or the like. Simply, initially, differing beliefs —going from abstract details to executions in under five years.

Perhaps the ideas were so fundamentally incompatible that yes, killing the holders of the conflicting beliefs was the only possible way to have peace in the land —the civil war being thus the conflict between ideas instantiated. Or what was an initially a metaphorical battle over the notion of sovereignty at the level of Parliament and the King, created a conflict which resulted in a war where soldiers on the ground were told what they were fighting for in terms they could understand and kill for —even if bore no relation to those root cause.

Which brings me to the present day.

UKIP are not the National Socialist German Workers’ Party. Farage doesn't appear to have any leanings towards fasicm(*). He's just a racist populist who complains about being picked on when everyone denounces his poster which was "unfortunate" to come out the day an elected MP —something Farage has never achieved, despite many attempts— was killed by someone far to the right of UKIP. That alleged killer is unlikely to have been directly radicalized by the runup to the #Brexit vote —though one must suspect, he wouldn't have been voting for remain.

Because what the Leave campaigns have come down to, in the end, is anti-European racism: it is for Brexit what anti-Scots advertising was the conservative party in the 2015 UK elections. Except the Brexit message is coming at the end of months and months —years, even of press hate directed at "migrants" and "refugees", "swarms of them". That's where the hate has been nurtured: not in a few weeks in the summer of 2016, nor even in a poster which was appalling enough before what took place later that day. It's in the front pages of the Daily Mail and the Daily Express every morning. When I nip round to my local shop to buy a baguette, I have to struggle to avoid making eye contact with some half-page-height headline text above a photo, text invariably starting with "MIGRANTS", or, failing that, MUSLIMS

That's what's closest to Nazism: the relentless selling of hate against a group of people. Here though, possibly more for profit than any ideology.

Yet that hate they sell seeps in the minds of people —that's what would have radicalized the alleged killer. The Daily Mail.

And what the Vote Leave campaign has done is legitimised that hate: made it OK. And irrespective of the outcome of the referendum, it's going to remain.

If Britain votes to leave, post-whatever economic shocks hit our economy and exchange rate, while the negotiations go on with the EU about how we should still be friends, all those people worked up into a hate of the foreigner will be demanding they get what they were promised. It's going to be fundamentally unpleasant for anyone considered "not one of us". I don't know whether me and my family meet that criteria: I do know I'd be exercising my ancestral right to an Irish passport later on in June, not just for me —but for my son. Because I'm don't know if a joint UK/US citizen of Irish/Scottish/Gujarati descent would be on the UKIP approved list —or if he wants to live in a country where UKIP has influence. And while that US passport gives him one exit option —things don't look good there either.

If, as I hope happens, Britain votes to stay in the EU, all that stirred up hate will remain.

Assuming it's a close vote, the losers aren't going to say "oh, well, never mind" —there's going to be a lot of angry resentment. And the "remain" coalition have helped stoke up that anger and resentment, with their dire predictions of doom and devastation. Disappointed Leave voters and activists are going to feel their victory was stolen, and are unlikely to go away. Perhaps hopefully, Farage himself will —and then be forever remembered for "that poster"; his rivers of bloodˆ epitath.

Those in the Conservative Party may form a majority of MPs, as well as members —they aren't going to away: expect a coup attempt later on in the year, with Boris waiting to graciously accept the crown.

And the rest of the nation? Those papers are going to keep churning out their hate, now with "if only they'd listen", and "we warned you" accompanying each article of lies, exaggeration and hate. People are going to keep reading them —and keep getting angry.

Whatever happens, it's not going to be pleasant.

(*) Do note, however, that his mock attempt to resign as leader of the party and it's refusal to be accepted by the party does mimic that of Hitler's threat to resign in 1921 from the then-nascent NZ party. (The Rise and Fall of the Third Reich, Shirer, 1961, Chapter 2).

[Photo, Freemantle Square, Kingsdown, Site of Prior's Fort, 1642-1645; a small plaque in the centre of the green remembers the dead. Nine-tree hill got its name at the time, as did the nearby Cromwell Road].


No, you didn't want that stack trace

Tail of a test run.

Test failure

What am I doing today? End to end performance testing of the new S3A implementation. Or not.

2016-06-16 18:17:32,339 INFO  s3.S3aIOSuite (Logging.scala:logInfo(58)) -
 Loading configuration from ../cloud.xml
  java.lang.RuntimeException: Unable to load a Suite class that was discovered in the
   runpath: org.apache.spark.cloud.s3.S3aIOSuite
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:84)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

So, a test case wouldn't load. No details. Well, let's paste in the stack trace and see if the IDE will tell me where things went wrong.

catch {
  case e: Exception => {
    // TODO: Maybe include the e.getClass.getName
    // and the message for e in the message
    // cannotLoadDiscoveredSuite,
    // because Jess had the problem
    // That gradle cut off the stack trace so she
    // couldn't see the cause.
    val msg = Resources("cannotLoadDiscoveredSuite", suiteClassName)
    throw new RuntimeException(msg, e)   ***HERE ***

Ah. The inner cause has been lost, whoever wrote the test runner knew this as someone called "Jess" complained. And the reaction was not to address it, but just add a TODO todo note.

I generally automatically -1 any patch whose test cases lose exception data. This test runner isn't quite doing that, but it is cutting the message from the top of the list, relying on the test reporter to display the whole list. Which, clearly, the scalatest reporter is not doing.

Having written a distributed JUnit runner in the distant path, I understand a key issue with test runners: they're hard to test. You essentially have to bootstrap your own testing infrastructure, then generate the failure conditions you see from tests. Which does mean: you get to stare at the stack traces. I think what we are seeing here is evidence that whoever is developing scalatest doesn't use maven.

This also shows what's wrong with TODO notes: they're technical debt you forget about.

At least if you keep adding more and more things to your issue tracker, you've got a nice shared graph of filed vs completed. Whose project tracks the number of TODO line items in a source tree?

(Update: found the problem. The test in question had a base class with an after{} cleanup clause, as did a mixed in trait. Whatever error message scalatest raises at that point, well, it doesn't make it through the testrunner unscathed)


Fear of Dependencies

There are some things to be scared of; some things to view as a challenge and embrace anyway.

Peter hasn't done this climb before. He thinks it will be fun. Peter is wrong

Here, Hardknott Pass falls into the challenge category —at least in summertime. You know you'll get up, the only question is "cycling" or "walking".

Hardknott in Winter is a different game, its a "should I be trying to get up here at all" kind of issue. Where, for reference, the answer is usually: no. Find another way around.

Upgrading dependencies to Hadoop jitters between the two, depending on what upgrade is being proposed.

And, as the nominal assignee of HADOOP-9991, "upgrade dependencies", I get to see this.

We regularly get people submitting one line patches "upgrade your dependency so you can work with my project' —and they are such tiny diffs people think "what a simple patch, it's easy to apply"

The problem is they are one line patches that can lead to the HBase, Hive or Spark people cornering you and saying things like "why do you make my life so hard?"

Before making a leap to Java 9, we're trapped whatever we do. Upgrade: things downstream break. Don' t upgrade, things downstream break when they update something else, or pull in a dependency which has itself updated.

While Hadoop has been fairly good at keeping its own services stable, where it causes problems is in applications that pull in the Hadoop classpath for their own purposes: HBase, Hive, Accumulo, Spark, Flink, ...

Here's my personal view on the risk factor of various updates.

Critical :

We know things will be trouble —and upgrades are full cross-project epics

  • protobuf., This will probably never be updated during the lifespan of Hadoop 2, given how google broke its ability to link to previously generated code.
  • Guava. Google cut things. Hadoop ships with Guava 11 but has moved off all deleted classes so runs happily against Guava 16+. I think it should be time just to move up, on the basis of Java 8 compatibility alone.
  • Jackson. The last time we updated, everything worked in Hadoop, but broke HBase. This makes everyone very said
  • In Hive and Spark: Kryo. Hadoop core avoids that problem; I did suggest adding it purely for the pain it would cause the Hive team (HADOOP-12281) —they knew it wasn't serious but as you can see, others got a bit worried. I suspect it was experience with my other POM patches that made them worry.
I think a Jackson update is probably due, but will need conversations with the core downstream projects. And perhaps bump up Guava, given how old it is.

High Risk

Failures are traumatic enough we're just scared of upgrading unless there's a good reason.
  • jetty/servlets. Jetty has been painful (threads in the Datanodes to peform liveness monitoring of Jetty is an example of workarounds), but it was a known and managed problem). Plan is to move off jetty entirely and -> jersey + grizzly.
  • Servlet API.
  • jersey. HADOOP-9613 shows how hard that's been
  • Tomcat. Part of the big webapp set
  • Netty —again, a long standing sore point (HADOOP-12928, HADOOP-12927)
  • httpclient. There's a plan to move off Httpclient completely, stalled on hadoop-openstack. I'd estimate 2-3 days there, more testing than anything else. Removing a dependency entirely frees downstream projects from having to worry about the version Hadoop comes with.
  • Anything which has JNI bindings. Examples: leveldb, the codecs
  • Java. Areas of trauma: Kerberos, java.net, SASL,

With the move of trunk to Java 8, those servlet/webapp versions all need to be rolled.

Medium Risk

These are things where we have to be very cautious about upgrading, either because of a history of brittleness, or because failures would be traumatic
  • Jets3t. Every upgrade of jets3t moved the bugs. It's effectively frozen as "trouble, but a stable trouble", with S3a being the future
  • Curator 2.x ( see HADOOP-11612 ; HADOOP-11102) I had to do a test rebuild of curator 2.7 with guava downgraded to Hadoop's version to be confident that there were no codepaths that would fail. That doesn't mean I'm excited by Curator 3, as it's an unknown.
  • Maven itself
  • Zookeeper -for its use of guava.
Here I'm for leaving Jets3t alone; and, once that Guava is updated, curator and ZK should be aligned.

Low risk:

Generally happy to upgrade these as later versions come out.
  • SLF4J yes, repeatedly
  • log4j 1.x (2.x is out as it doesn't handle log4j.properties files)
  • avro as long as you don't propose picking up a pre-release.
    (No: Avro 1.7 to 1.8 update is incompatible with generated compiled classes, same as protobuf.)
  • apache commons-lang,(minor -yes, major -no)
  • Junit

I don't know which category the AWS SDK and azure SDKs fall into. Their jackson SDK dependency flags them as a transitive troublespot.

Life would be much easier if (a) the guava team stopped taking things away and (b) either jackson stopped breaking things or someone else produced a good JSON library. I don't know of any -I have encountered worse.

2016-05-31 Update: ZK doesn't use Guava. That's curator I'm thinking of.  Correction by Chris Naroth.


Distributed Testing: making use of the metrics

3Dom, St Werburghs


In this article I introduce the concept of Metrics-first Testing, and show how instrumenting the internals of classes, enabling them to be published as metrics, enables better testing of distributed systems, while also offering potential to provide more information in production.

Exporting instrumented classes in the form of remotely accessible metrics permits test runners to query the state of the System Under Test, both to make assertions about its state, and to collect histories and snapshots of its state for post-run diagnostics.

This same observable state may be useful in production —though there is currently no evidence to support this hypothesis.

There are a number of issues with the concept. A key one is if these metrics do provde useful in production, then they become part of the public API of the system, and must be supported across future versions.

Introduction: Metrics-first Testing

I've been doing more scalatest work, as part of SPARK-7889, SPARK-1537, SPARK-7481. Alongside that, in SLIDER-82, anti-affine work placement across a YARN cluster, And, most recently, wrapping up S3a performance and robustness for Hadoop 2.8, HADOOP-11694, where the cost of an HTTP reconnect appears on a par with reading 800KB of data, meaning: you are better off reading ahead than breaking a connection on any forward seek under ~900KB. (that's transatlantic to an 80MB FTTC connection; setup time is fixed, TCP slow start also means that the longer the connection is held, the better the bandwidth gets)

On these projects, I've been exploring the notion of metrics-first testing. That is: your code uses metric counters as a way of exposing the observable state of the core classes, and then tests can query those metrics, either at the API level or via web views.

Here's a test for HADOOP-13047,: S3a Forward seek in stream length to be configurable

  public void testReadAheadDefault() throws Throwable {
    describe("Verify that a series of forward skips within the readahead" +
        " range do not close and reopen the stream");
    executeSeekReadSequence(32768, 65536);
    assertEquals("open operations in " + streamStatistics,
        1, streamStatistics.openOperations);

Here's the output

testReadAheadDefault: Verify that a series of forward skips within the readahead
  range do not close and reopen the stream

2016-04-26 11:54:25,549 INFO  Reading 623 blocks, readahead = 65536
2016-04-26 11:54:29,968 INFO  Duration of Time to execute 623 seeks of distance 32768
 with readahead = 65536: 4,418,524,000 nS
2016-04-26 11:54:29,968 INFO  Time per IOP: 7,092,333 nS
2016-04-26 11:54:29,969 INFO  Effective bandwidth 0.000141 MB/S
2016-04-26 11:54:29,970 INFO  StreamStatistics{OpenOperations=1, CloseOperations=0,
  Closed=0, Aborted=0, SeekOperations=622, ReadExceptions=0, ForwardSeekOperations=622,
  BackwardSeekOperations=0, BytesSkippedOnSeek=20381074, BytesRead=20381697,
  BytesRead excluding skipped=623, ReadOperations=0, ReadsIncomplete=0}

I'm collecting internal metrics of a stream, and using that to make assertions about the correctness of the code. Here, that if I set the readahead range to 64K, then a series of seek and read operations stream through the file, rather than break and reconnect the HTTPS link.

This matters a lot, as shown by one of the other tests, which times an open() call as well as that to actually read the data

testTimeToOpenAndReadWholeFileByByte: Open the test file
  s3a://landsat-pds/scene_list.gz and read it byte by byte

2016-04-26 11:54:47,518 Duration of Open stream: 181,732,000 nS
2016-04-26 11:54:51,688 Duration of Time to read 20430493 bytes: 4,169,079,000 nS
2016-04-26 11:54:51,688 Bandwidth = 4.900481  MB/S
2016-04-26 11:54:51,688 An open() call has the equivalent duration of
  reading 890,843 bytes

Now here's a Spark test using the same source file and s3a connector

ctest("CSVgz", "Read compressed CSV", "") {
    val source = sceneList
    sc = new SparkContext("local", "test", newSparkConf(source))
    val sceneInfo = getFS(source).getFileStatus(source)
    logInfo(s"Compressed size = ${sceneInfo.getLen}")
    val input = sc.textFile(source.toString)
    val (count, started, time) = duration2 {
    logInfo(s" size of $source = $count rows read in $time nS")
    assert(ExpectedSceneListLines <= count)
    logInfo(s"Filesystem statistics ${getFS(source)}")

Which produces, along with the noise of a local spark run, some details on what the FS got up to
2016-04-26 12:08:25,901  executor.Executor Running task 0.0 in stage 0.0 (TID 0)
2016-04-26 12:08:25,924  rdd.HadoopRDD Input split: s3a://landsat-pds/scene_list.gz:0+20430493
2016-04-26 12:08:26,107  compress.CodecPool - Got brand-new decompressor [.gz]
2016-04-26 12:08:32,304  executor.Executor Finished task 0.0 in stage 0.0 (TID 0). 
  2643 bytes result sent to driver
2016-04-26 12:08:32,311  scheduler.TaskSetManager Finished task 0.0 in stage 0.0 (TID 0)
  in 6434 ms on localhost (1/1)
2016-04-26 12:08:32,312  scheduler.TaskSchedulerImpl Removed TaskSet 0.0, whose tasks
  have all completed, from pool 
2016-04-26 12:08:32,315  scheduler.DAGScheduler ResultStage 0 finished in 6.447 s
2016-04-26 12:08:32,319  scheduler.DAGScheduler Job 0 finished took 6.560166 s
2016-04-26 12:08:32,320  s3.S3aIOSuite  size of s3a://landsat-pds/scene_list.gz = 464105
  rows read in 6779125000 nS

2016-04-26 12:08:32,324 s3.S3aIOSuite Filesystem statistics
  partSize=104857600, enableMultiObjectsDelete=true,
  statistics {
    20430493 bytes read,
     0 bytes written,
     3 read ops,
     0 large read ops,
     0 write ops},
     metrics {{Context=S3AFileSystem}

What's going on here?

I've instrumented S3AInputStream, instrumentation which is then returned to its S3AFileSystem instance.
This instrumentation can not only be logged, it can be used in assertions.

And, as the FS statistics are actually Metrics2 data, they can be collected from running applications.

By making the observable state of object instances real metric values, I can extend their observability from unit tests to system tests —all the way to live clusters.

  1. This makes assertions on the state of remote services a simple matter of GET /service/metrics/$metric + parsing.
  2. It ensures that the internal state of the system is visible for diagnostics of both test failures and production system problems. Here: how is the file being accessed? Is the spark code seeking too much —especially backwards? Were there any transient IO problems which were recovered from?
    These are things which the ops team may be grateful for in the future, as now there's more information about what is going on.
  3. It encourages developers such as myself to write those metrics early, at the unit test time, because we can get immediate tangible benefit from their presence. We don't need to wait until there's some production-side crisis and then rush to hack in some more logging. Classes are instrumented from the outset. Indeed, in SPARK-11373 I'm actually implementing the metrics publishing in the Spark History server —something the SPARK-7889 code is ready for.
Metrics-first testing, then, is instrumenting the code and publishing it for assertions in unit tests, and for downstream test suites.

I'm just starting to experiment with this metrics-first testing.

I have ambitions to make metric capture and monitoring a more integral part of test runs. In particular, I want test runners to capture those metrics. That's either by setting up the services to feed the metrics to the test runner itself, capturing the metrics directly by polling servlet interfaces, or capturing them indirectly via the cluster management tools.

Initially that'll just be a series of snapshots over time, but really, we could go beyond that and include in test reports the actual view of the metrics: what happened to various values over time? when when Yarn timeline server says its average CPU was at 85%, what was the spark history server saying its cache eviction rate was?

Similarly, those s3a counters are just initially for microbenchmarks under hadoop-tools/hadoop-aws, but they could be extended up the stack, through Hive and spark queries, to full applications. It'll be noisy, but hey, we've got tooling to deal with lots of machine parseable noise, as I call it: Apache Zeppelin.

What are the flaws in this idea?


Relevance of metrics beyond tests.

There's the usual issue: the metrics we developers put in aren't what the operations team need. That's inevitable, but at least we are adding lots of metrics into the internal state of the system, and once you start instrumenting your code, you are more motivated to continue to add the others.


Representing Boolean values

I want to publish a boolean metric: has the slider App Master had a node map update event from the YARN RM? That's a bool, not the usual long value metrics tools like. The fix there is obvious for anyone who has programmed in C:
public class BoolMetric extends AtomicBoolean implements Metric, Gauge<integer> {

  public Integer getValue() {
    return get() ? 1 : 0;
It's not much use as a metric, except in that case that you are trying to look at system state and see what's wrong. It actually turns out that you don't get an initial map —something which GETs off the Coda Hale JSON metric servlet did pick up in a minicluster test. It's already paid for itself. I'm happy. It's just it shows the mismatch between what is needed to monitor a running app, things you can have triggers and graphs of, and simple bool state view.


Representing Time

I want to track when an update happened, especially relative to other events across the system. I don't see (in the Coda Hale metrics) any explicit notion of time other than histograms of performance. I want to publish a wall time, somehow. Which leaves me with two options. (a) A counter listing the time in milliseconds *when* something happened. (b) A counter listing the time in milliseconds *since* something happened. From a monitoring perspective, (b) is better: you could set an alarm if the counter value went over an hour.

From a developer perspective, absolute values are easier to test with. They also support the value "never" better, with something "-1" being a good one here. I don't know what value of "never" would be good in a time-since-event value which couldn't be misinterpreted by monitoring tools. A value of -1 could be construed as good, though if it had been in that state for long enough, it becomes bad. Similarly, starting off with LONG_MAX as the value would set alarms off immediately. Oh, and either way, the time isn't displayed as a human readable string. In this case I'm using absolute times.

I'm thinking of writing a timestamp class that publishes an absolute time on one path, and a relative time on an adjacent path. Something for everyone


 The performance of AtomicLongs

Java volatile variables are slightly more expensive than C++ ones, as they act as barrier operations rather than just telling the compiler never to cache them. But they are still simple types.

In contrast, Atomic* are big bits of Java code, with lots of contention if many threads try to update some metric. This is why Coda Hale use a an AtomicAccumulator class, one that eventually surfaces in Java 8..

But while having reduced contention, that's still a piece of java code trying to acquire and release locks.

It would only take a small change in the JRE for volatile, or perhaps some variant, atomic to implement atomic ++ and += calls at the machine code level, so the cost of incrementing a volatile would be almost the same as setting it.

We have to assume that Sun didn't do that in 1995-6 as they were targeting 8 bit machines, where even incrementing a 16 bit short value was not something all CPUs could guarantee to do atomically.

Nowadays, even watches come with 32 bit CPUs; phones are 64 bit. It's time for Oracle to look ahead and conclude that it's time for even 64 bit volatile addition to made atomic.

For now, I'm making some of the counters which I know are only being updated within thread-safe code (or code that says "should only be used in one thread") volatile; querying them won't hold up the system.


 Metrics are part of your public API

This is the troublesome one: If you start exporting information which your ops team depends on, then you can't remove it. (Wittenauer, a reviewer of a draft of this article, made that point quite clearly). And of course, you can't really tell which metrics end up being popular. Not unless you add metrics for that, and, well, you are down a slippery slope of meta-metrics at that point.

The real issue here becomes not exposing more information about the System Under Test, but exposing internal state which may change radically across versions.

What I'm initially thinking of doing here is having a switch to enable/disable registration of some of the more deeply internal state variables. The internal state of the components are not automatically visible in production, but can be turned on with a switch. That should at least make clear that some state is private.

However, it may turn out that the metrics end up being invaluable during troubleshooting; something you may not discover until you take them away.

Keeping an eye on troubleshooting runbooks and being involved in support calls will keep you honest there.


Pressure to align your counters into a bigger piece of work

For the S3a code, this surfaces in HDFS-10175; a proposal to make more of those FS level stats visible, so that at the end of an SQL query run, you can get aggregate stats on what all filesystems have been up to. I do think this is admirable, and with the costs of an S3 HTTP reconnect being 0.1s, it's good to know how many there are.

At the same time, these overreaching goals shouldn't be an excuse to hold up the low level counters and optimisations which can be done at a micro level —what they do say is "don't make this per-class stuff public" until we can do it consistently. The challenge then becomes technical: how to collect metrics which would segue into the bigger piece of work, are useful on their own, and which don't create a long term commitment of API maintenance.



As described by Jakob Homan: " Large distributed systems can overwhelm metrics aggregators.  For instance, Samza jobs generated so many metrics LI's internal system blanched and we had to add a feature to blacklist whole metric classes "

These low-level metrics may be utterly irrelevant to most processes, yet, if published and recorded, will add extra load to the monitoring infrastructure.

Again, this argues for making the low-level metrics off by default, unless explicitly enabled by a debugging switch.

In fact, it almost argues for having some metric enabling profile similar to log4J settings, where you could turn on, say, the S3a metrics at DEBUG level for a run, leaving it off elsewhere. That could be something to investigate further.

Perhaps I could start by actually using the log level of the classes as the cue to determine which metrics to register:
if (LOG.isDebugEnabled) {

Related work

I've been trying to find out who else has done this, and what worked/didn't work, but there doesn't seem too much in published work. There's a lot of coverage of performance testing —but this isn't that. This about a philosophy of instrumenting code for unit and system tests, using metrics as that instrumentation —and in doing so not only enabling better assertions to be made about the state of the System Under Test, but hopefully providing more information for production monitoring and diagnostics.


In Distributed Testing, knowing more about state of the System Under Test aids both assertions and diagnostics. By instrumenting the code better, or simply making the existing state accessible as metrics, it becomes possible to query that state during test runs. This same instrumentation may then be useful in the System In Production —though that it is something which I currently lack data about.


It's not often (ever?) that I get people to review blog posts before I publish them: this one I did as it's introducing concepts in system testing which impacts everything from code to production system monitoring. Thank you to the reviewers: Jakob Homan, Chris Douglas, Jay Kreps, Allen Wittenauer, Martin Kleppman.

I don't think I've addressed all their feedback, especially Chris's (security, scope,+ others), and Jay went into detail on how structured logging would be superior —something I'll let him expound on in the Confluent blog.

Interestingly, I am exposing the s3a metrics as log data, —it lets me keep those metrics internal, and lets me see their values in Spark tests without changing that code.

AW pointed out that I was clearly pretty naive in terms of what modern monitoring tools could do, and should do more research there: On first blush, this really feels naïve as to the state of the art of monitoring tools, especially in the commercial space where a lot of machine learning is starting to take shape (e.g., Netuitive, Circonus, probably Rocana, etc, etc). Clearly I have to do this...

(Artwork: 3Dom in St Werburgh's)