Member since
07-29-2019
640
Posts
113
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7326 | 12-01-2022 05:40 PM | |
1987 | 11-24-2022 08:44 AM | |
2868 | 11-12-2022 12:38 PM | |
969 | 10-10-2022 06:58 AM | |
1405 | 09-11-2022 05:43 PM |
04-26-2022
01:59 PM
Hi @EmanuelArano
Keeping abreast of the fast moving and ever-updating collection of requirements in terms of supported operating systems, Database Management Systems, Java Development Kits and/or primary processor architectures compatible for use with Cloudera Data Platform (CDP) is quite a challenge and I doubt any members of the Cloudera Community keeps track of all of that in their head. Luckily, you don't have to.
I recommend you refer to the section subheaded CDP Private Cloud Base Supported Operating Systems
…in the documentation for the specific release you're interested in (you didn't say which specific version of CDP Private Cloud Base you want to install, and there are as of this writing eight different "point releases" of CDP Private Cloud Base 7.1). As a point of reference, CDP Private Cloud Base versions 7.1.2 and 7.1.3 were released in the Fall of 2020, so hopefully you are not attempting to install 7.1.0 at this point.
In particular, that section features a hyperlink to the very handy Cloudera Support Matrix, on which you can click on the product (in your case, Cloudera Manager and CDP Private Cloud Base) to see all the product versions that it supports. To narrow down your search for supported combinations, click again on the supported product versions that are highlighted in green. You can then scroll down to see the supported Operating Systems (along with Databases and JDKs).
Should you find, after consulting that documentation, that "redhat 8.5" is not supported, and you have full backups available to you, the best approach would be to restore your system to the state it was in prior to the yum update and proceed with your installation using 8.4 as the operating system. If for some reason you don't have full backups available, you might want to explore using the yum history command to roll back that last OS update.
... View more
04-21-2022
09:53 AM
@san_re What documentation are you following for what you are attempting to do here? You're much better off following a specific set of instructions from the site where you are downloading Mysql and/or NiFi from.
For NiFi, the canonical instructions can be found here:
NiFi System Administrator's Guide
... View more
04-20-2022
03:55 PM
Hi @Data1701
According to the API documentation, one can get a java.net.URISyntaxException when a passed string could not be parsed as a URI reference.
The file you are attempting to read in might very well be available on your local area network from a shared server drive, but it isn't available via a valid URI, or at the very least, the URI you are referencing in your Spark code isn't a valid and accessible URI.
What your problem boils down to is that the file isn't available via a web server, and the server that is running your Spark code can't retrieve it at the time your code executes. And that should shed light on why you had to previously upload your .csv files into CDSW, because that was the way to ensure that they could be found at runtime, since they were in a well-known/accessible location.
There are several valid approaches to addressing this, but the easiest solution, if you want to continue to use the code snippet you've written and shared here, is to place the file on some server that is accessible over the web (preferably via HTTPS) and refer to it using a fully-qualified URL. In order to do that, a functioning and secured web server will have to be available to you (you could set this up on your local workstation).
Let's assume you place the file on a web-accessible server somewhere local to your corporate network and the web-accessible directory path you place the file in turns out to be something like Data1701/project/data_folder/. Then you can change the assignment statement in your Spark code to this:
df = spark.read.format('csv').load('https://web.dept.yourcompany.com/Data1701/project/data_folder/file.csv', header=True)
…and the rest of your code should work, unchanged.
... View more
04-13-2022
11:58 PM
1 Kudo
Hi @san_re
The first error message you included is likely the result of not enabling connection over the network (or locally, as the case may be) to the Mysql server. You have to take some administrative action on a newly-installed Mysql server in order to allow outside applications (in this case, that would be the local NiFi) to connect to the server. It's difficult to troubleshoot that remotely because it could be something else, but in my experience that is the most common root cause.
This error message:
Driver class com.mysql.jdbc.driver is not found - Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.driver: Driver class com.mysql.jdbc.driver is not found
Typically means that the JDBC driver is not being found on the relevant CLASSPATH, which is in this case is NiFi. After you ensure you haven't mistyped the name of the java Driver class, you need to make sure you installed the appropriate .jar file in the location where NiFi is expecting to find it, and that the file is not corrupted, etc.
... View more
04-13-2022
11:33 PM
1 Kudo
Hi @Saraali Thank you for asking a great question! Allow me to expand a bit on the answer posted earlier by @Azhar_Shaikh.
He's correct that you could write a Python script leveraging the Pandas API to programmatically create an MS Excel file, and then call that script in NiFi using ExecuteStreamCommand, although perhaps using ExecuteScript might be a better candidate, depending on how your overall flow is designed and what external software you feel like installing or configuring.
There's a reasonably well-documented set of classes/methods in the Pandas API that would allow you to, once you have the data from your .csv file read in, convert the data to a Pandas DataFrame and then write the DataFrame to an Excel file. If your software development skills are limited to Python, that would be a workable approach.
My reading of your question, however, was that you were asking about writing a custom processor, not invoking a script. If you are not limited to Python like the original poster in the above-referenced Stack Overflow thread, you should consider writing a full-on NiFi processor in Java and leverage libraries such as the Apache POI library or The JExcel library. You can use either library to programmatically read, write and modify the content of an Excel spreadsheet from a Java program, but the later library only provides support for processing Excel files in the .xls (1997-2003) format. This approach requires some significant software development skills, because it doesn't involve just Java programming but a certain amount of familiarity with the associated tools, principally Maven. Telling you how to do that would involve a substantial, article-length tutorial. I still recommend Andy LoPresto's conference session from the 2019 DataWorks Summit Conference, Custom Processor Development with Apache NiFi to folks new to NiFi processor development that want to get an overview of what's involved.
If you don't have those software development skills or the time to obtain them, I would suggest you engage Professional Services to develop the processor you need. If you're a Cloudera Subscription Support customer, we can connect you with your Account team to discuss your potential project. Let me know if you are interested in this path by using the community's private message functionality to transmit your contact information.
This thread will remain open so other community members with greater expertise with custom NiFi processor development can contribute, if they so desire.
... View more
04-13-2022
11:33 AM
1 Kudo
@buzzamus To be completely honest, I have not tried what you're attempting, but based solely on logical deduction, I don't think you are going to get this to work. To understand why I say that, take a look at the section headed macOS System Requirements in the document Cloudera ODBC Driver for Apache Hive. That document is for 2.6.1 of the Driver (in other words, more up-to-date than the 2.5.0.x version you are running), and it explicitly says that it was written for macOS version 10.11, 10.12, or 10.13. You're running macOS Monterey (12.1) and using a CPU that came out significantly later than the one this code was designed to run on, and I think it's safe to assume that the binaries you are trying to install are not Universal 2 app binaries that work on Intel Macs and Macs based on Apple silicon.
I can't speak authoritatively on whether or not this driver is currently supported for ARM based processors.
This thread will remain open so other community members with greater expertise with macOS and ODBC can weigh in, if they so desire
... View more
04-13-2022
06:21 AM
@serg93 Good to hear that you have a valid Cloudera subscription. The absolutely correct approach is still to reach out to your Cloudera Account representative, and they can get you the access you need.
... View more
04-12-2022
12:49 PM
Hi @serg93
Assuming you have a valid Cloudera subscription, reach out to your Cloudera Account representative, and they can get you the access you need.
... View more
04-11-2022
01:59 PM
Hi @buzzamus
I think it would be helpful to community members inclined to answer your question if you ran the following two commands at the command line and posted the output in a reply, here in this thread.
Open up a terminal window, and at the shell prompt, issue this command, to retrieve the installed Mac OS X operating system version:
urmachine:~ usrname$ sw_vers
…and then, at the same prompt, issue this command, to retrieve the version number and associated information about the Cloudera ODBC Driver for Apache Hive that you've installed on your machine.
urmachine:~ usrname$ /usr/sbin/pkgutil --info cloudera.hiveodbc
... View more
04-11-2022
12:00 AM
Hi @fatalprocess
You write that you "have literally used the VM out of the box", but you didn't indicate what "VM" it was, where you retrieved it from, whose instructions you followed to install it (from Codementor?) or configure it or what version you installed. You also didn't indicate which virtualization platform you're using. All this makes it quite difficult to troubleshoot your problem remotely.
Judging from the web page shots you posted, I will assume that you are intending to use the HDP Sandbox, but it's still not clear which version you installed. The most recent version of the HDP Sandbox was based on HDP 3.0.1.0, which dates from November 2018. It would probably be a better approach to "start over", download that latest version and carefully follow the Sandbox Deployment and Install Guide and the accompanying tutorial Learning the Ropes of the HDP Sandbox.
I should mention, though, that the HDP Sandbox is based on legacy products that no longer represent Cloudera's current product offerings (which is why the two web pages I referred to above are difficult to land on, directly). Since you say that you don't know what you are doing and have not been trained properly, if you're able to choose I strongly recommend simply abandoning your work with the HDP Sandbox and proceeding to use and train yourself on Cloudera Data Platform, which in it's on-premises "form factor" can be installed as CDP Private Cloud. CDP supersedes HDP as Cloudera's Enterprise Data Cloud offering.
There is extensive and thorough documentation on installing the CDP Private Cloud Base Edition of Cloudera Data Platform using on-premises hardware and Cloudera also has an extensive tutorial on installing a CDP Private Cloud Base (trial version) leveraging AWS in an infrastructure as a service (IaaS) fashion which you should consider.
... View more