Member since
07-29-2019
640
Posts
114
Kudos Received
48
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 14416 | 12-01-2022 05:40 PM | |
| 3290 | 11-24-2022 08:44 AM | |
| 4949 | 11-12-2022 12:38 PM | |
| 1788 | 10-10-2022 06:58 AM | |
| 2577 | 09-11-2022 05:43 PM |
04-13-2022
11:58 PM
1 Kudo
Hi @san_re
The first error message you included is likely the result of not enabling connection over the network (or locally, as the case may be) to the Mysql server. You have to take some administrative action on a newly-installed Mysql server in order to allow outside applications (in this case, that would be the local NiFi) to connect to the server. It's difficult to troubleshoot that remotely because it could be something else, but in my experience that is the most common root cause.
This error message:
Driver class com.mysql.jdbc.driver is not found - Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.driver: Driver class com.mysql.jdbc.driver is not found
Typically means that the JDBC driver is not being found on the relevant CLASSPATH, which is in this case is NiFi. After you ensure you haven't mistyped the name of the java Driver class, you need to make sure you installed the appropriate .jar file in the location where NiFi is expecting to find it, and that the file is not corrupted, etc.
... View more
04-13-2022
11:33 PM
1 Kudo
Hi @Saraali Thank you for asking a great question! Allow me to expand a bit on the answer posted earlier by @Azhar_Shaikh.
He's correct that you could write a Python script leveraging the Pandas API to programmatically create an MS Excel file, and then call that script in NiFi using ExecuteStreamCommand, although perhaps using ExecuteScript might be a better candidate, depending on how your overall flow is designed and what external software you feel like installing or configuring.
There's a reasonably well-documented set of classes/methods in the Pandas API that would allow you to, once you have the data from your .csv file read in, convert the data to a Pandas DataFrame and then write the DataFrame to an Excel file. If your software development skills are limited to Python, that would be a workable approach.
My reading of your question, however, was that you were asking about writing a custom processor, not invoking a script. If you are not limited to Python like the original poster in the above-referenced Stack Overflow thread, you should consider writing a full-on NiFi processor in Java and leverage libraries such as the Apache POI library or The JExcel library. You can use either library to programmatically read, write and modify the content of an Excel spreadsheet from a Java program, but the later library only provides support for processing Excel files in the .xls (1997-2003) format. This approach requires some significant software development skills, because it doesn't involve just Java programming but a certain amount of familiarity with the associated tools, principally Maven. Telling you how to do that would involve a substantial, article-length tutorial. I still recommend Andy LoPresto's conference session from the 2019 DataWorks Summit Conference, Custom Processor Development with Apache NiFi to folks new to NiFi processor development that want to get an overview of what's involved.
If you don't have those software development skills or the time to obtain them, I would suggest you engage Professional Services to develop the processor you need. If you're a Cloudera Subscription Support customer, we can connect you with your Account team to discuss your potential project. Let me know if you are interested in this path by using the community's private message functionality to transmit your contact information.
This thread will remain open so other community members with greater expertise with custom NiFi processor development can contribute, if they so desire.
... View more
04-11-2022
12:00 AM
Hi @fatalprocess
You write that you "have literally used the VM out of the box", but you didn't indicate what "VM" it was, where you retrieved it from, whose instructions you followed to install it (from Codementor?) or configure it or what version you installed. You also didn't indicate which virtualization platform you're using. All this makes it quite difficult to troubleshoot your problem remotely.
Judging from the web page shots you posted, I will assume that you are intending to use the HDP Sandbox, but it's still not clear which version you installed. The most recent version of the HDP Sandbox was based on HDP 3.0.1.0, which dates from November 2018. It would probably be a better approach to "start over", download that latest version and carefully follow the Sandbox Deployment and Install Guide and the accompanying tutorial Learning the Ropes of the HDP Sandbox.
I should mention, though, that the HDP Sandbox is based on legacy products that no longer represent Cloudera's current product offerings (which is why the two web pages I referred to above are difficult to land on, directly). Since you say that you don't know what you are doing and have not been trained properly, if you're able to choose I strongly recommend simply abandoning your work with the HDP Sandbox and proceeding to use and train yourself on Cloudera Data Platform, which in it's on-premises "form factor" can be installed as CDP Private Cloud. CDP supersedes HDP as Cloudera's Enterprise Data Cloud offering.
There is extensive and thorough documentation on installing the CDP Private Cloud Base Edition of Cloudera Data Platform using on-premises hardware and Cloudera also has an extensive tutorial on installing a CDP Private Cloud Base (trial version) leveraging AWS in an infrastructure as a service (IaaS) fashion which you should consider.
... View more
04-04-2022
06:48 AM
@hbenner89 ,
In addition to what @ChethanYM wrote above, you should also share the file size that you are attempting to upload.
As a general matter, you can't expect a web browser to enable you to upload a file with arbitrarily large size, so the perhaps unstated reason the Jira issue you pointed to was resolved with the status "won't fix" is because this is not a limitation that is specific to Hue.
... View more
02-13-2022
02:29 PM
1 Kudo
Hi @cdh
Just to add on to the answer given above by @araujo , I wanted to address the second part of your question:
If not can i install CDP without license or trail versions. if kindly provide links to download and installation document as i never installed CDP.
No, you cannot install a non-trial version of CDP without a valid Cloudera subscription. Cloudera does have programs to support those doing a legitimate evaluation/PoC of Cloudera's data platform software for lengths of time beyond that allowed by trial versions, however. Your best approach, if you're interested in that, would be to contact the Cloudera Sales Team to find out more about your company's options.
If you're honestly looking to evaluate a data platform, you can currently do so without an existing valid Cloudera subscription by downloading and installing the Trial Version of CDP Private Cloud Base Edition of Cloudera Data Platform. A link to the documentation describing in detail how to install this version of CDP Private Cloud Base can be found on that page, labeled CDP Private Cloud Base Trial Installation.
Cheers!
... View more
02-13-2022
07:50 AM
Hi @Rashmi22 A version of this question was previously asked and answered here:
Easiest way to do a Cloudera Demo.
There is no "official" CDP-based Sandbox at this time, but there is information you can read at the link above which you can use to deploy CDP Private Cloud on Virtualbox and Vagrant.
Cheers!
... View more
01-29-2022
09:04 AM
1 Kudo
Hi @Lsh
The the guide you are referring to is for Altus Director. Altus Director's purpose was to enable reliable self-service for using CDH and Cloudera Enterprise Data Hub with the infrastructure available from cloud service providers such as Azure.
I can't speak authoritatively to the status of the GitHub Repository you mentioned. I would guess that it was intentionally retired because Altus Director has already reached its End of Support (EoS) date. Cloudera's lifecycle support policies are documented here:
Support lifecycle policy (open that link and then expand the section labeled "Cloudera Altus (Platform as a service)" underneath Current End of Support (EoS) Dates).
At that last link (as of this writing) you can also read that Cloudera 6.3 (expand the section labeled "Cloudera Enterprise products") is only supported until the end of March 2022 (the last version of Director was 6.3.1). For eligible customers, Limited Support for CDH versions 6.2 and 6.3 will be provided during the six-month period beginning on April 1, 2022, and ending on September 30, 2022.
So you should understand that in the best case circumstance, any testing you plan to do deploying CDH using Altus Director will essentially become obsolete later this year. For that reason, I would not recommend spending time learning how to deploy CDH using Altus Director or any other tool.
Cloudera's current Enterprise Data Platform, since the Fall of 2019, is Cloudera Data Platform (CDP), which in its cloud-based "form factor" is now called CDP Public Cloud. CDP supersedes CDH as Cloudera's Enterprise Data Platform offering. If you are looking to evaluate a current data platform for use within your company utilizing Azure, then you can follow the instructions for using the latest CDP stack with the public cloud option by consulting the Azure quick start documentation.
... View more
01-28-2022
09:51 AM
Hi @rahi
Just reading the download URL you're using, it appears that you are aware that last year, Cloudera modified its download policies and the binaries you are seeking to access are now only available in a private repository. If not, please see the announcement here: Transition to private repositories for CDH, HDP and HDF
if you are aware of this, then you are probably getting the HTTP 401 Authentication Required error because you are not using the correct credentials to access the private repository. The credentials to access this private repository are not generally the same ones to access Cloudera's website or the Cloudera community. Instead, individuals working at companies with a valid Cloudera subscription can generate repository credentials from a CDH license key, and there is a full description of how to do this in the Cloudera Enterprise 6.x Release Notes here: Version, Packaging, and Download Information
…under the subheading Obtain Credentials, among other places.
What you should also be aware of, however, is that Altus Director has already reached its EoS date of July 2020. Cloudera's lifecycle support policies are documented here:
Support lifecycle policy (open that link and then expand the section labeled "Cloudera Altus (Platform as a service)" underneath Current End of Support (EoS) Dates).
At that last link (as of this writing) you can read that Cloudera 6.3 is only supported until the end of March 2022 (The last version of Director was 6.3.1). For eligible customers, Limited Support for CDH versions 6.2 and 6.3 will be provided during the six-month period beginning on April 1, 2022, and ending on September 30, 2022. So you should understand that in the best case circumstances, any testing you plan to do deploying CDH using Altus Director will essentially become obsolete later this year. For that reason alone, I would not recommend spending time on that stack.
Cloudera's current Enterprise Data Platform, since the Fall of 2019, is Cloudera Data Platform (CDP), which in its cloud-based "form factor" is now called CDP Public Cloud. CDP supersedes CDH as Cloudera's Enterprise Data Platform offering. If you are looking to evaluate a current data platform for use within your company utilizing AWS, then you can follow the instructions for using the latest CDP stack with the public cloud option by consulting the AWS quick start documentation.
... View more
01-27-2022
05:22 PM
Hi @ryu
Just from reading the absolute file path you've called out, evidently you are running log4j version 1.
To back up a bit for the sake of other community members who might be reading this: the chief reason we're even talking about this is because Log4j 1 reached its end of life (EOL) and is no longer officially supported by the Apache™ Logging Services™ Project as of 5 August 2015, over six years ago now. But largely due to the increased scrutiny of Log4j in general in the wake of the CVE-2021-44228 vulnerability (which impacted Log4j 2), a new vulnerability that was rated lower in severity has come to light, CVE-2021-4104, which does affect Log4j 1. But again, Log4j 1 has reached EOL, and Apache's Logging Services™ Project isn't providing any more releases for Log4j 1, even to remediate serious security vulnerabilities. For both of these and for other reasons, the best practical approach is to upgrade to a more up-to-date data platform that is being actively supported.
Cloudera's current Enterprise Data Platform, since the Fall of 2019, is Cloudera Data Platform (CDP), which in it's on-premises "form factor" is now called CDP Private Cloud. CDP supersedes HDP as Cloudera's Enterprise Data Platform, and as an aside, HDP 2.6.1 reached it's end of support date in December 2020 (open that link and then expand the section labeled "Hortonworks Data Platform (HDP)" underneath Current End of Support (EoS) Dates).
As a core part of its business, Cloudera addresses customer needs for vulnerability remediation as part of the benefits of a subscription agreement even when Apache no longer supports an impacted component.
You can read Cloudera's judgement about how concerned you should be about that Log4j 1 vulnerability here: Cloudera response to CVE-2021-4104
The reason upgrading is the best practical approach is because arguably the proper way to upgrade log4j is to go through the source code for all the affected components that use the Log4j version you are trying to avoid, become intimate with the details of how they use the various Logging APIs and then update or even totally rewrite the code that uses those existing, risk-exposed APIs to use the APIs in the new, replacement version of log4j 2 that is not exposed to known vulnerabilities (presumably 2.15.x or later). Then recompile against log4j 2 exclusively, unit test and release each changed component, and then test the entire system as a whole for regressions. And then finally migrate the completed product with only log4j 2 to production. As you probably understand, that takes a lot of engineering effort and it's not something a data platform administrator or even a data platform team at an enterprise that is using HDP for it's internal data management needs, for example, should be expected to complete on their own.
Upgrading the platform to a new, more up-to-date release that is actively being maintained is the next best thing, and as a practical matter, its better. It allows data platform users to take advantage of the fact that the data platform provider/vendor is going to have those substantial engineering resources and be able to bring them to bear on the necessary API updates on an ongoing basis and in a timely fashion.
If for whatever reason you aren't able to or are unwilling to upgrade and don't have a subscription agreement…well, just engaging in a bit of logical deduction from first principles (because I don't have access to an HDP 2.6.1-based cluster at the moment to actually try it) I think the short answer to this portion of your question:
Can we just replace the log4j jar file with an upgraded version?
…is a qualified "No". Some of the critical APIs for Log4j 2 are simply not backwardly-compatible with Log4j 1, so you should assume that just dropping in the Log4j 2 .jar files into an existing HDP installation is not going to work without issues. Other members of the Cloudera Community have reported that even dropping the Log4j 2 .jar file(s) into an installation of CDH 6.3.x, which was built with Log4j 2 specifically, produced less than desirable results.
However, there does exist a Log4j 1.x bridge which reportedly will "forward" all requests for Log4j 1 to Log4j 2, assuming that you have a valid Log4j 2 installation, so you might want to explore that option if you can test it out on a non-production cluster first. It also requires that you do a thorough job of removing any Log4j 1 jars in the application's CLASSPATH for any Hadoop component. It goes without saying that Cloudera doesn't support this however and again, I haven't tried it so you should only proceed down this path if you are desperate to remove a Log4j 1 installation, don't have or can't obtain a subscription agreement and have a solid plan to roll back the change if it doesn't work out.
... View more
01-19-2022
11:49 AM
@kevmac you and @Eric_B can find out about the actual target Log4j library version for Cloudera's suggested remediation by consulting the blog post Cloudera Response to CVE-2021-44228. The version of the Log4j library that the aforementioned remediation script is intended for is specified in the very first paragraph, sub-headed Summary.
... View more