2017-04-12

Mocking: an enemy of maintenance

Bristol spring

I'm keeping myself busy right now with HADOOP-13786, an O(1) committer for job output into S3 buckets. The classic filesystem relies on rename() for that, but against S3 rename is a file-by-file copy whose time is O(data) and whose failure mode is "a mess", amplified by the fact that an inconsistent FS can create the illusion that destination data hasn't yet been deleted: false conflict.
. This creates failures like SPARK-18512., FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A, as well as long commit delays.

I started this work a while back, making changes into the S3A Filesystem to support it. I've stopped focusing on that committer, and instead pulled in the version which Netflix have been using, which has the advantages of a thought out failure policy, and production testing. I've been busy merging that with the rest of the S3A work, and am now at the stage where I'm switching it over to the operations I've written for the first attempt, the "magic committer". These are in S3A, where they integrate with S3Guard state updates, instrumentation and metrics, retry logic, etc etc. All good.

The actual code to do the switchover is straightforward. What is taking up all my time is fixing the mock tests. These are failing with false positives "I've broken the code", when really the cause is "these mock tests are too brittle". In particular, I've had to rework how the tracking of operations goes, as a Mock Amazon S3Ciient is no longer used by the committer, instead its associated with the FS instance, which then is shared by all operations in a single test method. And the use of S3AFS methods shows up where its failing due to the mock instance not initing properly. I ended up spending most of Tuesday simply implementing the abort() call, now I'm doing the same on commit(). The production code switches fine, it's just the mock stuff.

This has really put me off mocking. I have used it sporadically in the past, and I've occasionally had to work other people's. Mocking has some nice features
  • Can run in unit tests which don't need AWS credentials, so Yetus/Jenkins can run them on patches.
  • Can be used to simulate failures and validate outcomes.
But the disadvantage is I just think they are too high maintenance. One test I've already migrated to being an integration test against an object store; I retained the original mock one, but just deleted that yesterday as it was going to be too expensive to migrate, and, with
that IT test, obsolete.

The others, well: the changes for abort() should help, but every new S3A method that gets called triggers new problems which I need to address. This is, well, "frustrating".

It's really putting me off mocking. Ignoring the Jenkins aspect, the key benefit is structure fault injection. I believe I could implement that in the IT tests too, at least in those tests which run in the same JVM. If I wanted to, I could probably even do it in the forked VMs by f propagating details on the desired failures to the processes. Or, if I really wanted to be devious, by running an HTTP proxy in the test VM and simulating network failures for the AWS client code itself to hit. That wouldn't catch all real-world problems (DNS, routing), but I could raise authentication, transient HTTP failures, and of course, force in listing inconsistencies. This is tempting, because it will help me qualify the AWS SDK we depend on, and could be re-used for testing the Azure storage too. Yes, it would take effort —but given the cost of maintaining those Mock tests after some minor refactoring of the production code, it's starting to look appealing.

(photo: Garage door, Greenbank, Bristol)

2017-04-11

The interruption economy

With the untimely death of a laptop in Boston in February, I've rebuilt two laptops recently.

The first: a replacement for the dead one: a development macbook pro wired up to the various bits of work infra: MS office, VPN,  even hipchat. The second, a formerly dead 2009 macbook brought back to life with a 256GB SSD and a boost of its RAM to 8GB (!).

Doing this has brought home to be a harsh truth

The majority of applications you install on an OSX laptop consider it not just a right, but a duty, to interrupt you while you are trying to work.

It's not just the things where someone actually want's to talk  to (e.g. skype), it's pretty much everything you can install

For example, iTunes wants to be able to interrupt me, including playing sounds. It's a music player application, and it also wants to make beeping noises? Same for spotify. Why should background music apps or foreground media playback apps think they need to be able to interrupt you when they are running in the background?

iTunes wants to interrupt me

Dropbox. I didn't realise this was doing notifications until it suddenly popped up to tell me the good news that it was keeping itself up to date automatically.

Dropbox interrupting me with a random fact

Keeping your installation up to date is something we should expect all applications to do. It should not be so important that you should pop up a dialog box "good news, you are only at risk from 0-day exploits we haven't found or patched yet!". Once I was aware that dropbox was happy to interrupt me, I went to its settings, only to discover that it also wants to interrupt me on "comments, share's and @mentions", and on synced files.
Dropbox wants to harass me

I hadn't noticed that a tool I used to sync files across machines had evolved into a groupware app where people could @mention me, but clearly it has, and in teams, interruptions whenever someone comments on things is clearly considered good. It also wants to interrupt me on files syncing. Think about that. We have an application whose primary purpose is "synchronising files across machines", and suddenly it wants to start popping up notifications when it is doing its job? What else should we have? Note taking applications sharing the good news that they haven't crashed yet?
Apple Notes wants to interrupt me

Maybe, because amongst the apps which also consider interruption and inalienable right are: OneNote and macOS notes app. I have no idea what they want to interrupt me about: Notes doesn't specify what it wants to alert me about, only that it wants to notify me on locked screens and make a noise. OneNote? Lets you spec which notebooks can trigger interrupts, but again, the why is missing.

The list goes on. My password manager, text editor, IDE. Everything I install defaults to interrupting me.

Yes, you can turn the features off, but on a newly installed machine, that means that you have to go through every single app and disable every single interruption point. Miss out some small detail and while you are trying to get some work done, something pops up to say "lucky you! Something has happened which Photos thinks it is so important you should stop what you are doing and use it instead!". when you are building up two laptops, it means there's about 20+ times I've had to bring up the notifications preference pane, scroll down to whichever app last interrupted me, turn off all its notifications, then continue until something else chooses to break my concentration.

The web browsers want to let web pages interrupt you too.

Firefox you can't disable it, at least not without delving into about:config.

Firefox doesn't seem to let me utterly disable interrupts

You can block it in the OS notifications settings, which implies it is at least integrated with the OS and the system-wide do-not-disturb feature.


Chrome: you can manage it in the browser —even though google don't want you to stop it, but it doesn't appear to  integrated with the OS;

Google chrome recommends interruptiblity

With the OS integration, OSX's do-not-disturb feature won't work. will work here, so if you do let Chrome notify you, webapps gain the right to interrupt you during presentations, watching media content, etc.
Safari lets you disable web site notifications, you just have to clear the check box
Safari? Permitted, but OS controlled, completely blockable. This doesn't mean that webapps shouldn't be able to interrupt you: google calendar is a good example, it's just the easier we make it to do this, the more sites will want to.


The OS isn't even consistent itself. There is no way to tell time machine to not annoy you with the fact that it hasn't updated for 11 days. It's not part of the notification system, even though it came from the same building. What kind of example is that to set for others?


Because the default behaviour of every application is to interrupt, I have to go through every single installed app to disable it else my life is a constant noise of popups stating irrelevant facts. You may not notice that as you install one application at a time, turning off the settings individually, but when you build up a new box, the arrogance of all these applications becomes obvious, as it takes some time to actually stop your attention being attacked by the software you install.

Getting users to look at your app, your web site, is roped in as "The attention economy". That certainly applies to things like twitter, facebook, snapchat, etc. But how does translate into dropbox trying to get my attention to tell me that it's keeping itself up to date? Or whatever itunes or photos wants to interrupt me on? Why does OneNote need to tell me something about a saved workbook? This isn't "the attention economy". This is "interruption economy": people terrified that users may not be making full use of their features, so trying to keep popping up to encourage you to use the app or whatever new feature they've just installed

Interrupting people while they are trying to work is not a good use of the life of people whose work depends on "getting things done without interruptions". As my colleagues should know, though some of them forget, I don't run with hipchat on precisely because I hate getting popups "hey Steve, can i just ask..." , where the ask is something that I'd google for the answer myself, so why somebody asks me to google for them, I don't know. But even with the workflow interrupts off, things keep trying to stop me getting anything done

Then there's the apps which interrupt without any warning at all. I got caught out at this at Dataworks summit, where halfway through a presentation GPGMail popped up telling me there was a new version. This was a presentation where I'd explicitly set "do not disturb" on and war running full screen, but GPG mail checks weren't using it. Lesson: turn off the wifi as well as setting everything to do-not-disturb/offline.

Those update prompts, they are important. But everything keeps going "update me! now!" they end up being an irritant to ignore, just like the way the "service now!" alert pops up our car when we use it. It's just another low-level hint, not something which matters like "low pressure in tyres".

What it does really highlight is that having an applications keep itself up to date with security patches is still considered, on OSX, to be something worth interrupting the user to let them know about. All I can say it's a good thing that Linux apps don't feel the same way, or apt-get upgrade would be unbearable.

 
Finally, there's the OS
  • It'd be good if the OS recognised when a full screen media/presentation app was underway and automatically went into silent mode at that point.
  • All the OS's own notifications "upgrade available", "no time machine backups" should be integrated with the same notification mechanisms for app viewers. That's to help the users, but also set an example for all others.

What to to really do about it?

I'd really like to be able to tell the OS that the default settings for any newly installed app is "no notifications". Maybe now I've built up the laptops I won't have to go through the torment of disabling it across many apps, so it'll just be that case by case irritant. Even so, there's still the pain of being reminded of update options even

What I can do though, is promise not to personally write applications which interrupt people by default.

Here then, is my pledge:
  1. I pledge to give my users the opportunity to live a life free of interruptions, at least from my own code.
  2. I pledge not to write applications which bring up notification boxes to tell you that they have kept themselves up to date automatically, that someone has logged in to another machine, or that someone else is viewing a document a user has co-authored.
  3. Ideally, the update mech should integrate that from the OS, and so it can handle the notifications (or not).
  4. If I then add a notifications in an application for what I consider to be relevant information, I pledge for the default state to be "don't".
  5. They will all go away when left alone.
  6. Furthermore, I pledge to use the OS supplied mechanism and integrate with any do- not-disturb mechanism the OS implements.
I know, I haven't done do client side code for a long time, but I can assure people, if I did: I'd try to be much less annoying than what we have today. Because I recognise how much pain this causes.