Steve Loughran: Transitive Issues

i am not going to discuss anything sensitive which gets discussed in the hadoop security list, but i do want to post my reply to someone giving us a list of those artifacts with known CVEs, either directly or in their own shaded packaging of dependent libraries (jackson, mainly), then effectively demanding an immediate fix of them all

I my response is essentially "how do we do that without breaking everything downstream to the extent that nobody will upgrade?". Which is not me trying to dismiss their complaint, rather "if anyone has the answer to this problem I would really love to know".

I do regret that failure to get the OSGi support in to hadoop 0.1x; then we could have had that level of isolation. But OSGi does have its own issues, hence a lack of enthusiasm. But would it be worse than the state we have today?

The message. I am trying to do as much as I can via macos dictation, which invariably requires a review and fixup afterwards. if things are confused in places, it means I didn't review properly. As to who sent the email, that doesn't matter. It's good that transitive dependency issues are viewed a significant concern, bad that there's no obvious solution here apart from "I say we dust off and nuke the site from orbit"

(Photo: winter mountaineering in the Brecon Beacons, 1996 (?). Wondering how best to ski down the North Face of Pen y Fan)

Thank you for this list.

Except in the special case of "our own CVEs", all hadoop development is in public, including issue tracking, which can be searched under https://issues.apache.org/jira/

I recommend you search for upgrades of hadoop, hdfs, mapreduce and yarn, identify the dependencies you are worried about, follow the JIRA issues, and, ideally, help get them in by testing, if not actually contributing the code. If there are no relevant JIRAs, and please create them,

I have just made an release candidate for hadoop 3.3.4, for which, I have attached the announcement. please look at the changelog to see what has changed.

We are not upgrading everything in that list. There is a fundamental reason for this. Many of these upgrades are not compatible. While we can modify the hadoop code itself to support those changes, it means a release has become transitively incompatible. That is: even if we did everything we can to make sure our code does not break anything it is still going to break things because of those dependencies I. And as a result people aren't going to upgrade.

Take one example: HADOOP-13386 Upgrade Avro to 1.9.2.

This is marked as an incompatible update, "Java classes generated from previous versions of avro will need to be recompiled". If we ship that all your applications are going to break. As well everyone else's.

jersey updates are another source of constant pain, as the update to v2 breaks all v1 apps, and the two artifacts don't coexist. We had to fix that by using a custom release of jersey which doesn't use jackson.

HADOOP-15983 Use jersey-json that is built to use jackson2

So what do we do?

We upgrade everything and issue an incompatible release? Because if we do that we know that many applications will not upgrade and we will end up having to maintain the old version anyway. I'm 100% confident that this is true because we still have to do releases of Hadoop 2 with fixes for our own CVEs.

Or, do we try and safely upgrade everything we can and work with the downstream projects to help them upgrade their versions of Apache Hadoop so at least the attack surface is reduced?

This is not hypothetical. If I look at two pieces of work I have been involved in recently, or at least tracking.

PARQUET-2158. Upgrade Hadoop dependency to version 3.2.0.

That moves parquet's own dependency from hadoop 2.10 to 3.2.0, so it will actually compile and run against them. People will be able to run it against 3.3.4 too... but at least this way we have set the bare minimum to being a branch which has security fixes on.

HIVE-24484. Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

This is an example of a team doing a major update; again it helps bring them more up-to-date with all their dependences as well as our own CVEs. From the github pull request you can see how things break, both from our own code (generally unintentionally) and from changes in those transitive dependencies. As a result of those breakages hive and tez have held back a long time.

One of the patches which is in 3.3.4 is intended to help that team

HADOOP-18332. Remove rs-api dependency by downgrading jackson to 2.12.7.

This is where we downgraded jackson from the 2.13.2.2 version of Hadoop 3.3.3 to version 2.12.7. This is still up to date with jackson CVEs, but by downgrading we can exclude its transitive dependency on the javax.ws.rs-api library, so Tez can upgrade, thus Hive. Once Hive works against Hadoop 3.3.x, we can get Apache Iceberg onto that version as well. But if the release was incompatible in ways that they considered a blocker, that wouldn't happen.

It really is a losing battle. Given your obvious concerns in this area I would love to have your suggestions as to how the entire Java software ecosystem –for that is what it is –can address the inherent conflict between the need to maintain the entire transitive set of dependencies for security reasons

A key challenge is the fact that often these update breaks things two away -a detail you often do not discover- until you ship. The only mitigation which has evolved is shading, having your own private copy of the binaries. Which as you note, makes it impossible for downstream projects to upgrade themselves.

What can you and your employers do to help?

All open source projects depend on the contributions of developers and users. Anything your company's engineering teams can do to help here will be very welcome. At the very least know that you have three days to qualify that 3.3.4 release to make sure that it does not break your deployed systems. If it does work, you should update all production system ASAP. If it turns out there is an incompatibility during this RC face we will hold the build and do our best to address. If you discover an problem after thursday, then it will not be addressed until the next release which you cannot expect to see until September, October or later. You can still help then by providing engineering resources to help validate that release. If you have any continuous integration tooling set up: check out and build the source tree and then try to compile and test your own products against the builds of hadoop and any other parts of the Apache Open Source and Big Data stack on which you depend.

To conclude then, I'd like to welcome you to participating in the eternal challenge of trying to keep those libraries up to date. Please join in. I would note that we are also looking for people with JavaScript skills as the yarn UI needs work and that is completely beyond my level of expertise.

If you aren't able to do this and yet you still require all dependencies to be up-to-date, I'm going to suggest you build and test your own software stack using Hadoop 3.4.0 as part of it. You would of course need to start with up-to-date versions of Jersey, Jackson, google guava, Amazon AWS and the like before you even get that far. However, the experience you get in trying to make this all work will again be highly beneficial to everyone.

Thanks,

Steve Loughran.

-----

[VOTE] Release Apache Hadoop 3.3.4

I have put together a release candidate (RC1) for Hadoop 3.3.4

The RC is available at:

https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/

The git tag is release-3.3.4-RC1, commit a585a73c3e0

The maven artifacts are staged at

https://repository.apache.org/content/repositories/orgapachehadoop-1358/

You can find my public key at:

https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log

https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md

Release notes

https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md

There's a very small number of changes, primarily critical code/packaging

issues and security fixes.

See the release notes for details.

Please try the release and vote. The vote will run for 5 days.

Steve Loughran

2022-08-02

Transitive Issues

No comments:

Post a Comment