Developer INFOSEC in a Post-Sony era

There's going to be a dev-conference in Bristol in February, voxxed days bristol;
where I'll be presenting my Hadoop and Kerberos tak,
, maybe with a demo of the Slider kdiag command, which after a couple of iterations I intend to move into Hadoop core as part of HADOOP-12649: Improve UGI diagnostics and failure handling. Any version which gets into hadoop core will be able to use some extra diagnostics I hope to add to UGI itself, such as the ability to query the renew time of a ticket, get the address of the KDC and probe for it, maybe more. Because Kerberos doesn't like me this week, at least not with zookeeper.

It should be a fun day and local developers should think about turning up. I was at the last little developer bash in November, where there was a great mix of talks ranging from the history of the Transputer to the challenge of implementing a mobile-app only bank.


What apparently hasn't make the cut is my other proposal, Household INFOSEC in a Post-Sony era. Which is a shame, as I have the outline of a talk there which would be seminal. This wouldn't a be a talk about platitudes like "keep flash up to date", it'd be looking about my rollout of a two-tier network, with the IoT zone hooked directly up to the internet by the router provided by the ISP, and a DD-WRT router offering a separate trusted-machine subnet and high-entropy-passworded, no-SSID-published wifi network restricted to devices I can control.

What I was really going to talk about, however, was how we are leaking personal information from all the devices we use, the apps we host in them, the cars we drive, the cameras we use (guess whose camera app considers camera GPS and timestamp information non-personally identifiable and hence uploadable? That's right: Sony). We leak that information, it gets stored elsewhere, and we now depend on the INFOSEC capabilities of those entities to keep that data private. And that's hard to pull off —even with Kerberos everywhere.

One aspect of my homework here is working out what data I have on my computers which I consider sensitive.

There's photos, which are more irreplaceable than anything else: my policy there is Disaster Recovery: using Google Photos as the off site backup; a local NAS server in the trusted subnet for on-site resilience to HDD failures. That server is powered off except for the weekly backups, reducing its transitive vulnerability to any ransomware that gets onto into the trust zone.

There's passwords to web sites, especially those with purchasing rights (amazon, etc), and financial institutions (paypal, banks) —in the UK my bank uses its chipped debit cards as the physical credential for login and cash transfer, so it's fairly secure. The others: less so, especially if my browser has been intentionally/unintentionally saving form information with things like CVV numbers.

And what else matters? I've come to the conclusion it is the credentials needed to gain write access to the ASF source code repositories. Not just Hadoop —it's the Ant source, it's slider, it's anything else where I am creating code where any of the following criteria are met
  1. The build is executed by a developer or a CI tool on a machine holding information (including their own credentials) to which someone malicious wants access.
  2. The build is executed by a developer or a CI tool on a machine running within a network to which someone malicious wants access.
  3. Generated/published artifacts are executed during a build process on a machine/network to which someone malicious wants access.
  4. The production code is executed somewhere where there is data which someone malicious wants to get at or destroy, or where adverse behaviour can the system is advantageous to someone malicious.
You can point to the Hadoop stack and say: it's going that way —but so is the rest of the OSS codebase. The LAMP stack, tomcat web servers, Xerces XML parsers, open office, linux device drivers, clipboard history savers like glipper, Python statistics libraries, ... etc. We live in a world where open source is everywhere from the datacentre to the television. If anyone malicious has the opportunity to deliberately inset vulnerabilities into that code —then they get to spread them across the planet. That source code then, is both a juicy target for anyone looking for 0-day exploits, but also for inserting new ones.

We've seen attacks on Kernel.org, and the ASF. With the dominance of git as the SCM tool, and it's use of SHA-1 checksums, the value of breaking into the servers is diminishing —what you need to go is get the malicious code checked in, that is: committed using the credentials of an authorised developer.

That'll be us then.

More succinctly: if the Internet becomes the location of an arms race, we're now targets en route to strategic goals by entities and organisations against whom we don't stand a chance.

How do you defend against nation states happy to give away USB-chargeable bicycle lights at an OSS conference? Who have the ability to break through your tier-3 ISP firewall and then the second level DD-WRT router that you never locked down properly and haven't updated for three weeks. We're don't stand a chance, not really

No doubt I'll come over as excessively paranoid, but its not as if I view my personal systems a direct target. It's just the source code repos do which I do have access are potentially of interest. And with other people in the Hadoop space building those same projects, something injected into the build using my credentials then has transitive access to everyone else who checks out and builds the same codebase. That's what worries me.

WTF do we do?

Short-term I'm switching my release process to a VM that's only ever used for that, so at least the artifacts I generate aren't indirectly contaminated by malware; I also need to automate a final SHA1 audit of every dependent artifact.

Medium term: I need to come up with a plan for locking down my git ssh/svn credentials and passwords so they aren't trivially accessible to anything malicious running on any laptop of mine. I know github is moving to 2FA and U2F auth, but that's for web and API auth: not git repo access. What the Linux Kernel team have is a much better story: 2FA for granting 24h of write access from a specific IP address.

Long term: I have no idea whatsoever

[photo: two Knog lights you charge up via USB ports. We should all know to be cautious about plugging in untrusted USB sticks —but who would turn down a freebie bike light given away at an OSS developer conference?]

No comments:

Post a Comment

Comments are usually moderated -sorry.