Steve Loughran: Gardening the commons

It's been a warm-but-damp summer in Bristol, and the vegetation in the local woods has been growing well. This means the bramble and hawthorn branches have been making their way out of the undergrowth and into the light —more specifically the light available in the mountain bike trails.

Being as both these plant's branches have spiky bits on them, the fact that they are growing onto the trails hurts, especially if you are trying to get round corners fast. And, if anyone is going round the trail without sunglasses they run a risk of getting hurt in/near the eye.

I do always wear sunglasses, but the limitations on taking the fast line though the trails hurts, and as there a lot of families out right now, I don't want the kids to get too scraped.

So on Saturday morning, much to the amazement of my wife, I picked up the gardening shears. Not to do anything in our garden though —to take to the woods with me.

Findings

Those Kevlar backed gloves that the Clevedon police like are OK on the outside for working with spiky vegetation, but the fingertips are vulnerable.
Gardening gets boring fast.
When gardening an MTB trail, look towards the direction oncoming riders will take.
A lot of people walking dogs get lost and ask for directions back to the car park.
Someone had already gardened about 1/3 of the Yertiz trail. (pronounciation based on: "yeah-it-is"
Nobody appreciates your work.

I appreciate the outcome my own work, I can now go round at speed, only picking up scrapes on the forearms on the third of the trail that nobody has trimmed yet. I actually cut back on the inside of the corners there for less damage on the racing line, while cutting the outside and face height bits for the families. Now they can have more fun on the weekends, I can do my fast work weekday lunchtimes.

They can live their lives with fewer wailing children, and I've partially achieved my goal of less blood emitted per iteration. I'll probably finish it off with the final 1/3 at some point, maybe mid-august, as I can't rely on anyone else.

There are no trail pixies but but what we make

Which brings me to OSS projects, especially patches and bug reports in Hadoop.

I really hate it when my patches get completely ignored, big or small.

Take YARN-679 for example. Wrap up piece of YARN-117, code has celebrated its third birthday. Number of reviewers. Zero. I know its a big patch, but it's designed to produce a generic entry point for YARN services with reflection based loading of config files (you can ask for HiveConfig and HBaseConfig too, see), interrupt handling which even remembers if its been called once, so on the second control-C bypasses the shutdown hooks (assumption: they've blocked on some IPC-retry chatter with a now-absent NN), and bail out fast. Everything designed to support exit codes, subclass points for testability. This should be the entry point for every single YARN service, and it hasn't had a single comment by anyone. How does that make me feel. Utterly Ignored —even by colleagues. I do, at least, understand what else they are working on...it's not like there is a room full of patch reviewers saying "there are no patches to review —let's just go home early". All the people with the ability to review the patches have so many commitments of their own, that the time to review they can allocate is called "weekends".

And as a result, I have a list of patches awaiting review and commit, patches which are not only big diffs, they are trivial ones which fix things like NPEs in reporting errors returned over RPC. That's a 3KB patch, reaching the age where, at least with my own child, we were looking at nursery schools. Nothing strategic, something useful when things go wrong. Ignored.

That's what really frustrates me: small problems, small patches, nobody looks at it.

And I'm as guilty as the rest. We provide feedback on some patch-in-progress, then get distracted and never do the final housekeeping. I feel bad here, precisely because I understand the frustration.

Alongside "old, neglected patches", there are "old, neglected bugs". Take, for an example HADOOP-3733 "s3:" URLs break when Secret Key contains a slash, even if encoded. Stuart Sierra gave a good view of the history from his perspective.

The bug was filed in 2008

it was utterly ignored
Lots of people said they had the same problem
Various hadoop developers said "Cannot reproduce"
It was eventually fixed on 2016-06-16 with a patch by one stevel@apache.
On 2016-06-16 cnauroth@apache filed HADOOP-13287 saying "TestS3ACredentials#testInstantiateFromURL fails if AWS secret key contains '+'.".

Conclusions

The Hadoop developers neglect things
if we'd fixed things earlier, similar problems won't arise.

I mostly concur. Especially in the S3 support, where historically the only FTEs working on it were Amazon, and they have their own codebase. In ASF Hadoop, Tom White started the code, and it's slowly evolved, but it's generally been left to various end users to work on.

Patch submission is also complicated by the fact that for security reasons, Jenkins doesn't test the stuff. We've had enough problems of people under-testing their patches here that there is a strictly enforced policy of "tell us which infrastructure you tested against". The calling out of "name the endpoint" turns out to be far better at triggering honest responses than "say that you tested this". And yes, we are just as strict with our colleagues. A full test run of the hadoop-aws module takes 10-15 minutes, much better than the 30 minutes it used to take, but still means that any review of a patch is time consuming.

I would normally estimate the time to review an S3 patch to take 1-2 hours. And, until a few of us sat down to work on S3A functionality and performance against Hive and Spark, those 1-2 hours were going weekend time only. Which is why I didn't do enough reviewing.

Returning to the S3 "/", problem

This whole thing was related to AWS-generated secrets. Those of us whose AWS secrets didn't have a "/" in this couldn't replicate the problem. Thus it was a configuration-space issue rather than something visible to all.
There was a straightforward workaround, "generate new credentials", so it wasn't a blocker.
That related issue, HADOOP-13287, is actually highlighting a regression caused by the fix for HADOOP-3733. In the process for allowing URLs to contain "/" symbols, we managed to break the ability to use "+" in them.
The regression was caught because the HADOOP-3733 patch included some tests which played with the tester's real credentials. Fun problem: writing tests to do insecure things which don't leak secrets in logs and assert statements.
HADOOP-13287 is not an example of "there are nearby problems" so much as "every bug fix moves the bug", something noted in Brook's "the mythical man month" in his coverage of IBM OS patches.
And again, this is a c-space problem, it was caught because Chris had + in his secret.

Finally, and this is the reason why it didn't surface with many of us, even though we had "/" in the secret is because the problem only arises if you put your AWS secrets in the URL itself, as s3a://AWSID:secret-aws-key@bucket

That is: if your filesystem URI contains secrets, which, if leaked —threaten the privacy and integrity of your data and is at risk of running up large bills, then, if the secret has a "/", the URL doesn't authenticate.

This is not actually an action I would recommend. Why? Because throughout the Hadoop codebase we assume that filesystem URIs do not contain secrets. They get logged, they get included in error messages, they make their way into stack traces that can go into bug reports. AWS credentials are too important to be sticking in URLs.

Once I realised people were doing this, I did put aside a morning to fix things. Not so much fixing the encoding of "/" in the secrets (and accidentally breaking the encoding of "+" in the process), but:

Pulling out the auth logic for s3, s3n and s3a into a new class, S3xLoginHelper.
Having that code strip out user:pass from the FS URL before the S3 filesystems pass it up to their superclass.
Doing test runs and seeing if that is sufficient to keep those secret out the logs (it isn't).
Having S3xLoginHelper print a warning whenever somebody tries to use secrets in URLs.
Edit the S3 documentation file to tell people not to do this —and warning the feature may be removed.
Edit the Hadoop S3 wiki page telling people not to do this.
Finally: fix the encoding for /, adding tests
Later, fix the test for +

That's not just an idle "may be removed" threat. In HADOOP-13252, you can declare which AWS credential providers to support in S3A, be it your own, conf-file, env var, IAM, and others. If you start doing this, your ability to embed secrets in s3a URLs goes away. Assumption: if people know what they are doing, they shouldn't be doing things so insecure.

Anyway, I'm glad my effort fixing the issue is appreciated. I also share everyone's frustration with neglected patches, as it wastes my effort and leaves the bugs unfixed, features ignored.

We need another bug bash. And I need to give that final third of the Yertiz trail a trim.

Steve Loughran

2016-07-19

Gardening the commons

No comments:

Post a Comment