Created 01-12-2022 01:45 PM
Hi experts,
Our hadoop cluster has an old version of log4j and we were wondering how to properly upgrade log4j?
Can we just replace the log4j jar file with an upgraded version?
Currently this is one of the log4j files in our hadoop cluster.
/usr/hdp/2.6.1.0-129/hadoop/client/log4j-1.2.17.jar
Any help is much appreciated.
Thanks,
Created 01-27-2022 05:22 PM
Hi @ryu
Just from reading the absolute file path you've called out, evidently you are running log4j version 1.
To back up a bit for the sake of other community members who might be reading this: the chief reason we're even talking about this is because Log4j 1 reached its end of life (EOL) and is no longer officially supported by the Apache™ Logging Services™ Project as of 5 August 2015, over six years ago now. But largely due to the increased scrutiny of Log4j in general in the wake of the CVE-2021-44228 vulnerability (which impacted Log4j 2), a new vulnerability that was rated lower in severity has come to light, CVE-2021-4104, which does affect Log4j 1. But again, Log4j 1 has reached EOL, and Apache's Logging Services™ Project isn't providing any more releases for Log4j 1, even to remediate serious security vulnerabilities. For both of these and for other reasons, the best practical approach is to upgrade to a more up-to-date data platform that is being actively supported.
Cloudera's current Enterprise Data Platform, since the Fall of 2019, is Cloudera Data Platform (CDP), which in it's on-premises "form factor" is now called CDP Private Cloud. CDP supersedes HDP as Cloudera's Enterprise Data Platform, and as an aside, HDP 2.6.1 reached it's end of support date in December 2020 (open that link and then expand the section labeled "Hortonworks Data Platform (HDP)" underneath Current End of Support (EoS) Dates).
As a core part of its business, Cloudera addresses customer needs for vulnerability remediation as part of the benefits of a subscription agreement even when Apache no longer supports an impacted component.
You can read Cloudera's judgement about how concerned you should be about that Log4j 1 vulnerability here: Cloudera response to CVE-2021-4104
The reason upgrading is the best practical approach is because arguably the proper way to upgrade log4j is to go through the source code for all the affected components that use the Log4j version you are trying to avoid, become intimate with the details of how they use the various Logging APIs and then update or even totally rewrite the code that uses those existing, risk-exposed APIs to use the APIs in the new, replacement version of log4j 2 that is not exposed to known vulnerabilities (presumably 2.15.x or later). Then recompile against log4j 2 exclusively, unit test and release each changed component, and then test the entire system as a whole for regressions. And then finally migrate the completed product with only log4j 2 to production. As you probably understand, that takes a lot of engineering effort and it's not something a data platform administrator or even a data platform team at an enterprise that is using HDP for it's internal data management needs, for example, should be expected to complete on their own.
Upgrading the platform to a new, more up-to-date release that is actively being maintained is the next best thing, and as a practical matter, its better. It allows data platform users to take advantage of the fact that the data platform provider/vendor is going to have those substantial engineering resources and be able to bring them to bear on the necessary API updates on an ongoing basis and in a timely fashion.
If for whatever reason you aren't able to or are unwilling to upgrade and don't have a subscription agreement…well, just engaging in a bit of logical deduction from first principles (because I don't have access to an HDP 2.6.1-based cluster at the moment to actually try it) I think the short answer to this portion of your question:
Can we just replace the log4j jar file with an upgraded version?
…is a qualified "No". Some of the critical APIs for Log4j 2 are simply not backwardly-compatible with Log4j 1, so you should assume that just dropping in the Log4j 2 .jar files into an existing HDP installation is not going to work without issues. Other members of the Cloudera Community have reported that even dropping the Log4j 2 .jar file(s) into an installation of CDH 6.3.x, which was built with Log4j 2 specifically, produced less than desirable results.
However, there does exist a Log4j 1.x bridge which reportedly will "forward" all requests for Log4j 1 to Log4j 2, assuming that you have a valid Log4j 2 installation, so you might want to explore that option if you can test it out on a non-production cluster first. It also requires that you do a thorough job of removing any Log4j 1 jars in the application's CLASSPATH for any Hadoop component. It goes without saying that Cloudera doesn't support this however and again, I haven't tried it so you should only proceed down this path if you are desperate to remove a Log4j 1 installation, don't have or can't obtain a subscription agreement and have a solid plan to roll back the change if it doesn't work out.