iOS 8.4, the windows vista of ipad

I like to listen to music while coding, with my normal strategy being "on a monday, pick an artist, leave them playing all week". It's a low-effort policy. Except iOS 8.4 has ruined my workflow.

iOS 8.4 music player getting playlists broken by building a sequence of the single file in the list

Now while I think Ian Curtis's version of Sister Ray is possibly better than the Velvet Underground's, it doesn't mean I want to listen to it completely, yet this is precisely what it appears to want to do. Both when I start the playlist, and sometimes even when it's been happily mixing the playlist. Sometimes it just gets into a state where the next (shuffled) track is the same as the current track, forever. And before anyone says "hey, you just hit the repeat-per-track option", here's the fun bit: it switched from repeat playlist to repeat track, all on its own. That implies a state change, either in some local variable (how?) or that the app is persisting state and reloading it, and that persist/reload changed the value. As a developer, I suspect the latter, as it's easier to get a save/restore of an enum wrong.

The new UI doesn't help. Apparently using Siri helps, as you can just say "shuffle" and let it do the rest. I couldn't get that far. Because every time I asked it to play my music, it warns me that this will break the up next list. That's the one that's decided my entire playlist consists of one MP3, Sister Ray covered by Joy Division.

If one thing is clear:not only is the UI of iOS 8.4 music a retrograde step, it was shipped without enough testing, or with major known bugs. I don't know which is worse.

Siri (and Cortana, OK google and Alexa) are all showing how speech recognition has improved over time, and we are starting to see more AI-hard applications running in mobile devices (google translate on Android is another classic) —but it's moot if the applications that the speech recognitions systems are hooked up to are broken.

Which is poignant, isn't it:
  • Cutting edge speech recognition software combining mobile devices & remote server-side processing: working
  • Music player application of a kind which even Apple have been shipping for over a decade, and which AMP and Napster had nailed down earlier: broken.
The usual tactic: rebooting? All the playlists are gone, even after a couple of attempts at resyncing with itunes.

That's in then: I cannot use the Apple music app on the iPad to listen to my music. Which given that a key strategic justification for the 8.4 release is the Apple Music service, has to be a complete disaster.

This reminds me so much of the windows Vista experience: an upgrade that was a downgrade. I had vista on a laptop for a week before sticking linux on. I don't have that option here, only the promise that iOS 9 will make things "better"

I would go back to the ipod nano, except I can't find the cable for that, so have switched to google play and streaming my in-cloud music backup. Which, from an apple strategic perspective, can't rate very highly, not if I am the only person abandoning the music player for alternatives that actually work.


Book Review, Hadoop Security, and distributed security in general

I've been reading the new ORA book, Hadoop Security, by Ben Spivey and Joey Echeverria. There's not many reviews up there, so I'll put mine up

  • reasonable intro to kerberos hadoop clusters
  • covers the identity -> cluster user mapping problem
  • ACLs in HDFS, YARN &c covered nicely —explanation and configuration
  • Shows you pretty much how to configure every Hadoop service for authenticated and authorized access, audit loggings and data & transport encryption.
  • has Sentry coverage, if that matters to you
  • Has some good "putting it all together" articles
  • Index seems OK.
  • Avoids delving into the depths of implementation (strength and weakness)

Overall: good from an ops perspective, for anyone coding in/against Hadoop, background material you should understand —worth buying.

Securing Distributed Systems

I'd bought a copy of the ebook while it was still a work in progress, so I got to see the original Chapter 2, "securing distributed systems: chapter to come". I actually think they should have left that page as it is on the basis that Distributed System Security is a Work in Progress. And while it's easy for all of us to say "defence in depth", none of us really practice that properly even at home. Where is the two-tier network with the fundamentally untrustable IoT layer: TVs, light bulbs, telephones, bittorrent servers, on a separate subnet from the critical household infrastructure from the desktops, laptops and home servers. How many of us keep our ASF, SSH and github credentials on an encrypted USB stick which must be unlocked for use? None of us. Bear that in mind whenever someone talks about security infrastructure: ask them how they lock down their house. (*)

Kerberos is the bit I worry about day to day, so how does it stack up?

I do think it covers the core concepts-as-a-user, and has a workflow diagram which presents time quite nicely. It avoids going in to those details of the protocol, which, as anyone who has ever read Colouris & Dolimore will note, is mindnumbingly complex and does hit the mathematics layer pretty hard. A good project for learning TLA+ would probably be "specify Kerberos"

ACLs are covered nicely too, while encryption covers HDFS, Linux FS and wire encryption, including the shuffle.

There's coverage of lots of the Hadoop stack, core Hadoop, HBase, Accumulo, Zookeeper, Oozie & more. There's some specifics on Cloudera bits: Impala, Sentry, but not exclusively and all the example configs are text files, not management tool centric: they'll work everywhere.

Overall then: a pretty thorough book on Hadoop security, for a general overview of security, Kerberos, ACLs and configuring Hadoop it brings together everything in to one place.

If you are trying to secure a Hadoop cluster, invest in a copy


Now, where is it limited?

1. A lot of the book is configuration examples for N+ services & audit logs. it's a bit repetitive, and I don't think anybody would sit down and type those things in. However, there are so many config files in the Hadoop space, and at least how to configure all the many services is covered. It just hampers the readability of the book.

2. I'd have liked to have seen the HDFS encryption mechanism illustrated, especially KMS integration. It's not something I've sat down to understand, and the same UML sequence diagram style used for Kerberos would have gone down.

3. It glosses over precisely how hard it is to get Kerberos working, how your life will be frittered away staring at error messages which make no sense whatsoever, only for you to discover later they mean "java was auto updated and the new version can't do long-key crypto any more". There's nothing serious in this book about debugging a Hadoop/Kerberos integration which isn't working.

4. Its bit on coding against Kerberos is limited to a couple of code snippets around UGI login and doAs. Given how much pain it it takes to get Kerberos to work client side, including ticket renewal, delegation token creation, delegation token renewal, debugging, etc, one and a half pages isn't even a start.

Someone needs to document Hadoop & Kerberos for developers —this book isn't it.

I assume that's a conscious decision by the authors, for a number of valid reasons
  • It would significantly complicate the book.
  • It's a niche product, being for developers within the Hadoop codebase.
  • It'd make maintenance of the book significantly harder.
  • To write it, you need to have experienced the pain of adding a new Hadop IPC, writing client tests against in-VM zookeeper clusters locked down with MiniKDC instances, or tried to debug why Jersey+SPNEGO was failing after 18 hours on test runs.
The good news is that I have experience the suffering of getting code to work on a secure Hadoop cluster, and want to spread that suffering more broadly.

For that reason, I would like to announce the work in progress, gitbook-toolchained ebook:

Kerberos and Hadoop: The Madness beyond the Gate

This is an attempt to write down things I've learned, using a Lovecraftian context to make clear this is forbidden knowledge that will drive the reader insane**. Which is true. Unfortunately, if you are trying to write code to work in a Hadoop cluster —especially YARN applications or anything acting as a service for callers, be they REST or IPC, you need to know this stuff.

It's less relevant for anyone else, though the Error Messages to Fear section is one of the things I felt the Hadoop Security book would have benefited from.

As noted, the Madness Beyond the Gate book is a WiP and there's no schedule to extend or complete it —just something written during test runs. I may finish it; I may get bored and distracted. But I welcome contributions from others, together we can have something which will be useful for those people coding in Hadoop —especially those who don't have the luxury of knowing who added Kerberos support to Hadoop, or has some security experts at the end of an email connection to help debug SPNEGO pain.

I've also put down for a talk on the same topic at Apachecon EU Data —let's see if it gets in.

(*) Flash removed except on Chrome browsers which I've had to go round and updated this week. The two-tier network is coming in once I set up a rasberry pi as the bridge, though with Ether-over-power the core backbone, life is tricky. And with PCs in the "trust zone", I'm still vulnerable to 0-days and the hazard imposed by other household users and my uses of apt-get, homebrew and maven & ivy in builds.I should really move to developing in VMs I destroy at the end of each week.

(**) plus it'd make for fantastic cover page art in an ORA book.