2016-09-30

s/macOS Sierra/r/macOS vista/

I've been using macOS sierra for about ten/eleven days now. and I've rebooted my laptop about 6+ times because the system was broken.

Two recurrent problems: failure to wake in the morning, gradual lockup of finder and transitive app failure.

Failure to wake: I go up to the laptop, hit the keyboard and mouse, nothing happens. Only way to fix: hold down the power and wait for a hard restart.

Back in 1999 I worked on a project with HP's laptop group, where we instrumented a set of colleague's laptops with a simple data collection app, then collected a few months of data. At the time this was considered "a lot of data". The result, the paper: The Secret Life of Notebooks. This showed that people tended to have a limited set of contexts, where context was defined as system setup: power, display, IP addr, and application: mail, ppt. And that people were so predictable in their use models, that doing some HMM to predict contexts wouldn't have been hard.

I ended up writing some little app which essentially did that: based on IPAddr and app (PPT, Acroread) full screen, could choose: power policy, network proxy options, sound settings (mute in meetings, etc). It was fairly popular amongst colleagues, because it would turn proxy stuff on and off for you, and know to turn off display timeouts when giving presentations; crank up the savings when on the move. When I look at Windows 8+ adaptation to network settings, or OSX's equivalent of that and the "When on battery ...", I see the same ideas. You don't get any HMM on the laptops though; for that you have to pick up an android phone and look at Google Now, something which really is trying to understand you and your habits. And, because it can read your emails, correlate those habits with emailed plans. If it really wanted to be innovative/scary it would work out who you were associated with (family, friends, colleagues, fellow students...) and use their actions to predict yours. Maybe someday.

User-wise, another interesting feature was how people viewed mail so differently when online vs offline. Offline, you'd see this workflow of outlook-> word-> outlook-> ppt-> outlook-> acroread-> outlook, ... etc, very fast cycles. It seemed uncontrolled window tabbing at first —until you realise it's peple going through attachments. Online, and people's workflow pulled in IE (it was a long time ago), and you'd get a cycle where IE was the most popular app cycled to from outlook. Email was already so full of links that the notion of reading email "offline" was dying. You could do it, but your workflow would be crippled. And that was 15+ years ago. Nowadays things would only be worse.

There was a second paper which was internal, plus circulated/presented to Microsoft. There I looked at system uptime, and especially the common sequence in the log

1998-08-23 18:15 event: hibernate
1998-08-24 09:00 event: boot

or

1998-09-01 11:20 event: suspend
1998-09-01 11:30 event: boot

That is: a lot of the time the laptop through it was going to sleep, it was really crashing.

My theory was that alongside the official ACPI sleep states S1-S5 there was actually a secret state S6, "the sleep you never awake from". Some more research showed that it was generally on startup that the process failed, and it was strongly correlated with external state changes: networks, power, monitors. It wasn't that the laptop made a mess of suspending, it was that when it came back up it couldn't cope with the changed state.

I don't know if macOS sierra has that issue: I do know that it has that problem if left attached to an external display overnight. Looking in the system logs, you can see powernap wakeups regularly (that's all displays off), but come the first user interaction event —where the displays are meant to kick off— they don't come up. This is resulting in system logs not far off from the '99 experiment

2016-09-27 22:20 powerd: suspend
2016-09-27 23:00 powerd: powernap wake
  ....

2016-09-27 23:30 powerd: powernap wake
..

2016-09-28 00:30 powerd: powernap wake
..
2016-09-28 08:30 powerd: powernap wake
..
1998-09-01 09:30 event: boot


That last one: that's me trying to use it.

I've turned off powernap to see if that makes a difference there.

That's the nightly problem. What's happened 3+ times is the lockup of Finder, with a gradual degradation of other applications as they go near its services.

First finder goes, and restarts do nothing
Finder not responding
Then the other apps fail, usually when you go near the filesytem dialogs, or the photo collection.
Safari not responding
As with finder, restart does nothing.

If it was my own code, I'd assume a lock is being acquired in the kernel on some filesystem resource and never being released. This is why locks should always have leases. Root cause of that lock/release problem? Who knows. I can't help wondering, though, if its related to all the new icloud sync feature, as that's the biggest filesystem change. I've also noticed that I usually have a USB stick plugged in; I'm going to go without that to see if it helps.

When i get this slow failure, i don't rush to reboot. It takes about 10 minutes to get my dev environment back up and running again: the IDEs, the terminal windows, etc, 2FA signing in to webapps, etc. I really don't want to have to do it. Instead I end up with bits of the UI keeling over, while I stick to the IDE, chrome, terminals. I had a bit of problem on Thursday evening when calendar locked up the extent I couldn't get the URLs for some conf calls; I had to use the phone to get the links and type them in.

Anyway, come the evening, after the conf calls and some S3a Scale tests, I kick off a shutdown.

And here a flaw of the OSX UI comes in: it assumes that whatever reason you are trying to do a shutdown for, it is not because finder has crashed. And it gives any application the right to veto the shutdown. You can't just select "shut down..." on the menu, you have to wait for any apps to block it, stop them and then continue. And even after doing all of that, I come in this morning and find the laptop, fans spinning away, me logged out but some dialog box about keychain access required. This is not shutting down, this is half hearted attempt at maybe shutting down sometimes if your OS hasn't got into a mess.

It's notable that Windows has some hard coded assumptions that a shutdown is caused by the failure of something. It also has, from the world of Windows Server, the concept that the user may not be waiting at the console waiting to click OK dialogs popped up by apps. Thus it has a harsher workflow.

  1. A WM_QUERYENDSESSION message comes out saying "we'd like to shut down, is that OK? Apps get the opportunity to veto the sesson end, but not if it's tagged as a critical shutdown. And of you don't service that event, you are considered dead and don't get a veto.
  2. The WM_ENDSESSION event sent to apps to say "you really are going down —get over it".
  3. There is a registry entry WaitToKillAppTimeout you can use to control how long the OS waits for applications and to terminate, WaitToKillServiceTimeout for services, and even HungAppTimeout to control how long an app has to respond to an exit menu request (WM_EXIT?) before being considered dead and so killed.
See? Microsoft know that things hang, that even services can hang, and that if you want to shut down then you want to shut down, not find out 12 hours later that it had stopped with a dialog box.

In contrast macOS Sierra has implicit assumptions that apps rarely hang, the OS services never deadlock, and that shutting down is a rare activity where you are happy to wait for all the applications to interact with you —even the ones that have stopped responding.

This may have held for for OS/X, but for macOS all those assumptions are invalid. And that makes shutdown far more painful and unreliable than it need be.

Now if you go low level and do a "man shutdown", you can see that a similar escalation process is built in there

Upon shutdown, all running processes are sent a SIGTERM followed by a SIGKILL.  The SIGKILL will follow the SIGTERM by an intentionally indeterminate period of time.  Programs are expected to take only enough time to flush all dirty data and exit.  Developers are encouraged to file a bug with the OS vendor, should they encounter an issue with this functionality.

I think from now on, it'll be a shutdown command from the console.

Anyway, because of all these problems, I do currently regret installing macOS sierra. It shipped to meet a deadline, rather than because it was ready.

macOS Sierra is not ready for use unless you are prepared to reboot every day, and are aware that the only way to reboot reliably is from the console.

2016-09-22

macOS Sierra and the rediscovered world of desktop agents

I sort of accidentally upgraded to macOS Sierra on the day it came out.

It wasn't that I'd be awaiting it, holding my breath until it was announced to a shrieking audience. Rather, my laptop had been complaining for about 3+ weeks that there were critical security patches to apply, patches which needed machine reboots. I was now two patches behind. Firefox had just updated to 49.0 and wanted a reboot, and Chrome was saying "chrome is old". With both browsers needing restarting, I may as well take the hit for the OS patches too, so went to the App Store for an update. And what should the front page have but a "macOS Sierra" banner. So I went for it.

Installation: slow. I'll take "29 minutes remaining" as the macOS equivalent of Vista's "Step 3 of 3 100% complete" message: a warning that the next operation's completion time is a Halting Problem kind of estimate.

I went out into the sunlight to read about Scala collections, and after doing that long enough to get a headache checked back upstairs again "29 minutes remaining", so went to the household 2009 imac (not upgradeable) and did some collaborative editing on google docs. Which is why google docs, for all its lack of offline-ness, is such a good tool. If you have a browser with a network connection, you have a word processor.

After it's there, what do you see?
  • Cuter animations
  • Sound in the UI. I don't like sound; there is a way to turn beeps off.
  • A new look notification bar
  • messaging app that looks more like the IOs one.
  • The ability to turn on auto delete of backups trash
  • The option to have all your docs backed up to iCloud
  • Safari doesn't tell web sites when you have Flash, silverlight or Java installed, makes you go through some minor hoops to run the plugins.
  • Siri

Broken
  • IntelliJ IDEA scrolling (really, all Java Swing apps and scrolling). Very sensitive.
  • Keyboard remapping with Karabiner. I need this as I have a UK keyboard plugged into a US-keyboarded Mac. Even with Karabiner elements things aren't  right —I fear another reboot is coming. 
What don't you get
  • TIme Machine complaints about backups haven't moved to the notification API. This means I can't configure them not to keep telling me off. Probably a sign that Time Machine isn't getting attention, at least not in its current form: presumably they are working on the Apple File System equivalent, so ignoring this one. 
  • Any tangible improvements in the UI of mail
  • Any tangible improvements in the UI of OS Calendar. I could go on an extended diatribe against that at some point, —like why is when I decline an event it stays in the calendar and I have to actually delete it, after which it asks me if I really really want to do that. It's as if the apple team want me to attend meetings. Or how the event details window is tall and thin, which is the wrong shape for wide screens —and is utterly the wrong shape to display URLs and dial in details for conference calls. I think Conway's law is telling me about Apple's teamwork model there. Meetings you can't duck out of; a focus on F2F meets with a short roomname "dogcow upper" rather than a url to a webex event.
  • APFS being production ready. I may reformat my external SSD drive (The one I keep VBox images on), to try it. Not sure I'd gain much from it though. Perhaps I'll start with a 64MB SSD card.

The safari flash/java changes are wonderful. Why?

The lack of header information is good on its own
  1. Stops brokered malware adds bidding to submit adverts to users in europe-but-not-russia with a specific version of flash installed.
  2. Makes it harder to distinguish users uniquely by their plugins. There's still probably fonts and things, but at least one information point is gone
As for the hoops, as well as defending from malware, it provides extra motivation for everyone to move off flash. That includes BBC news, which happily serves up HTML 5 videos to IOS devices, yet complains about flash on the desktop. They are going to have to change that policy fast —which is great for those of us who are trying to move to a 0-flash desktop for household infosec reasons alone.

Now, Siri
 
I asked it where my wife Bina was. Instead it tried to arrange a marriage between myself and a colleague

Siri arranges a marriage for me

I did work out the root cause here: she's in my google contacts but not exchange ones, and OSX contacts is hooked up to exchange contacts only. I added a record to that contact list and then I could bond. Funny though.

I've tried a few other commands, some work, some don't. A problem here is that I like to work with background music, I don't feed it out from the laptop; got an old ipod plugged into an amplifier across the room. I'd need to embrace itunes and either choppy-during-builds bluetooth or the long wire to have siri damping down the music when I talk to it.

I think the thing about Siri is you have to consider the user base of most people who use macOS: they don't use terminals. their interaction is via the finder, perhaps spotlight. In this context, Siri is An Agent, in the terminology of Yoav Shoam; you say something, it gets translated (Modern ML algorithms), then that is used to generate The Plan, which is then executed. I'd assume that final execution could be done with AppleScript: instrument your app with state queries and operators, and perhaps Siri could work on it.

Funnily enough, Agents with distributed computation were actually the first thing I worked on at HP Labs, the sole trace of which appears to be some citations, some locally scanned documents of mine, and a printing of a usenet post.There it was typing, not talking, the little Windows/386 / HP NewWave agent sent the message to the Prolog interpreter running on a workstation (TCP? I forget. If so, probably the first time I ever encountered socket error codes). We were actually trying to do multimodal stuff: you could point to something and then ask for actions. I got up to demo state, but, relying on Prolog parsing, it wasn't real natural language —no more than SQL is. Hopefully, Siri does a better job of it. And they could go multimodal once Siri goes into continuous listening: select something in finder and ask siri to act on it.

Given that past work then, I guess I should give it a go

2016-09-09

Scalatest: thoughts and ideas

Spark uses Scalatest for its testing.

I alternate between "really liking this" and "bemoaning aspects of it".

Overall: I think the test language is better than JUnit, especially once you embrace more of the scala language. But the test execution could be improved, especially maven integration, and, to a lesser extent, that test reporting.

France 2016

Likes


eventually()


  eventually(timeout, policy) { closure };

This does a busy wait for the closure to complete (i.e. not throw an exception), policies include exponential backoff.

Pro: easy to write clauses in a test case which block until a state is reached or things timeout

Con: there's no way by default for those clauses to declare that they will never complete. That happens, especially if you are waiting for some state such as the number of events to be exactly 3 in a queue. If the queue size goes to 4, it's never going to drop to 3, so the clause could fail fast.

I think the fix there would be to define a specific "unrecoverable assertion failure" exception and raise that, have "eventually()" not swallow them.


the === check,


 assert(3 === events.queueSize)

This does good reporting on what the values are on an equality failure. It does need rigorousness about the order; I follow the junit assertEquals policy of expected-value-first. Projects really need a style guide/policy here.

Weakness: no typechecking at compile time. Better get those types right. Though as your tests should run through all assertions unless it is somehow nondeterministic, you'll find out soon enough.

Intercept:



  val ex = intercept[IOException.class] {
    doSomethingThatShouldRaiseAnIOE()
  }
  assert(ex.contains("host"), s"no 'host' in $ex")

Here there's a straightforward catch of a specific type (and failure raised if it didn't happen). As the exception is returned, the test case can continue. And with string formatting with the s"" syntax, reasonable groovy-esque diagnostics.


Matchers and the declarative specification syntax


  update("submissions") should be(5)

Ignoring the fact that the RFC2517 means that the syntax ought to be "must" and not "should", this style is fairly similar to JUnit 4 matchers, which is not something I ever do much of.

Pro: looks nicer than assert() clauses
weaknesses: you can add more details in assert() clauses —provided you sit down and do it.


Could try harder

 

the before and after mechanisms:, beforeclass and afterclass



While seems and easy way to declare things to run before tests, it doesn't work well with traits/mixins. In contrast, having setup() and teardown() methods, combined with Scala's strict order of evaluating traits and what what "super.setup()" would mean in a trait, allows you to write tests with traits where different traits can execute their setup/teardown clauses in a deterministic manner. Note that JUnit's before/after stuff is also a bit ambiguous, so it's not a denunciation of Scalatest here —more a commentary of the setup problem.

ignoring tests.


I've ended up doing this in the tests for spark-cloud, writing a conditional ctest() function. The secret there is to know that the test(name) { closure } declaration is not declaring a method, as in JUnit, but simply declaring an invocation of the test() method —an invocation which takes place during the construction of an instance of the test suite. There is nothing to stop you wrapping that call in other mechanisms: extra conditions, dynamic parameters, etc. It's only a closure, you understand.

Maven integration



Needs better wildcard and test suite declarations. I've only just discovered the ability to include a string in the name of the test case to run:

mvt -Dtest=none '-Dsuites=org.apache.spark.deploy.history.HistoryServerSuite incomplete' ; say moo

This command
  • doesn't run any java tests (because -D test=none doesn't match anything)
  • runs every test in HistoryServerSuite which has the world "incomplete" in the title
  • says "moo" after, so I can get on with other things while scalac gets round to building things, scalatest to running them. This matters because Scalac is so, so slow.

Reporting

The test reporter "helpfully" strips off the stack trace on assertion failures, so you don't get the full stack. This seems a good idea, except in the special case "you have the assertions inside a method which your test cases call". Do that and you suddenly lose the ability to detect what's wrong because the stack trace has been stripped —all you get is the name of the utility method. Just give us the full stack and stop trying to be clever..

My new conditional test runner


Here, then , is my ctest() function to declare a test which is conditional on the state of the entire suite, and an optional extra per-test predicate

/**
* A conditional test which is only executed when the
* suite is enabled and the `extraCondition`
* predicate holds.
* @param name test name
* @param detail detailed text for reports
* @param extraCondition extra predicate which may be
* evaluated to decide if a test can run.
* @param testFun function to execute
*/
protected def ctest(
  name: String,
  detail: String = "",
  extraCondition: => Boolean = true)
  (testFun: => Unit): Unit = {
  if (enabled && extraCondition) {
    registerTest(name) {
      logInfo(s"$name\n$detail\n---------------")
      testFun
    }
  } else {
    registerIgnoredTest(name) {
      testFun
    }
  }
}


What is critical to know is that this ctest() function, is not declaring any function/method in the test suite. It's just a function invoked in the suite's constructor,  interpreted in the order of listing in the file, expected to register the associated closure for invocation.

The normal test() function registers the test; the new ctest() function takes some extra detail text (logged @ info, and when I do something about reporting, may make it to the HTML), a condition for optional execution and that closure. There's also a suite-wide enabled() predicate. This is something which can be subclassed or mixed-in through a tratit for configurable execution. As an example. there are object store tests for both S3A and Azure, which are only enabled if the relevant credentials have been supplied.

Here's a more complex example


private[cloud] class S3aLineCountSuite
  extends CloudSuite with S3aTestSetup {

  init()

  override def useCSVEndpoint: Boolean = true

  def init(): Unit = {
    if (enabled) {
      setupFilesystemConfiguration(conf)
    }
  }

  override def enabled: Boolean = super.enabled

     && hasCSVTestFile

  ctest("S3ALineCountReadData",
    "Execute the S3ALineCount example") {
    val sparkConf = newSparkConf(CSV_TESTFILE.get)
    sparkConf.setAppName("S3ALineCountDefaults")
    assert(0 === S3LineCount.action(sparkConf, Seq()))
  }
}


The enabled predicate is true if the S3aTestSetup mixin was happy (that is, I've added credentials), and a CSV test file is defined. Then we run an example application, verifying
its exit code is 0.

Do I like this? Mostly.
 

Things I'm yet to explore


Async testing:

How to get the most out of scalatest 

(at least as far as I've learned so far)

  • Keep a strict ordering of (expected === actual). I use the same ordering as JUnit —I hope others are consistent there, as it's not so obvious in a test failure that things are different.
  • Add extra details in assert() clauses.
  • Embrace eventually()
  • Give every test case in a test suite a string which is at least partially unique. That way, you can run an individual test with the -Dsuite mechanism.
  • Try to make them fast. This really matters on spark, because there are a lot of tests to run.
  • If you are going near traits, have setup/teardown methods tagged as before {} and after{}, then use them through the traits consistently
  • Don't go too overboard in traits, even if you spent enough time in C++ to be perfectly comfortable with mixins and expect the maintainers to be happy with that too. As you can't make that latter expectation: use carefully.
  • Do try all the different spec mechanisms, rather than just stick with one because "it most reminds you of what you used to use". You could be missing out there
  • However: don't try all the different spec mechanisms wildly. Do try to be consistent with the rest of the project
Most of all: read the underlying source code. It's there. Look at what test(name) { clause } does and follow it through. See how tests are executed. Think about how to take best advantage of that in your test suites.