About ask_bill_brooks

sparkdeveloper · ‎05-08-2022

Thanks @ask_bill_brooks . Much appreciate the reply. I did solve the problem using an uber/fat jar. However, I was hoping for an elegant solution. I did look into shading on gradle but it was too confusing and difficult to explain and maintain. sbt didn't seem to have shading (or an easier approaching to shading). At the moment, we are stuck with a fat jar. Hope there is an easier deployment method in the future via the dependencies.scala file.

aval · ‎05-06-2022

Thanks for that! That's helpful.

kasu · ‎05-05-2022

Where can I get a copy? I am following this documentation: http://vmwareinsight.com/Articles/2020/6/5803025/How-to-install-Cloudera-On-VirtualBox-In-Windows

Yaniv · ‎05-03-2022

No, actually I didn't get what I need. I just understood that I need to develop it by myself

ask_bill_brooks · ‎05-03-2022

Hi @Freschone May I ask why you need to download Quickstart VM based on CDH 5.10? Is this a classroom assignment? As a general matter, Cloudera is no longer updating or making the Cloudera Quickstart VM available for download (and hasn't since March of 2020) because it was outdated and obsolete as the last version was based on CDH 5.13, which went out of support in the Fall of 2020. The credentials to access the private repository where Cloudera is now distributing previous versions of CDH are not are not generally the same ones to access Cloudera's website or the Cloudera community. Employees of organizations with a valid Cloudera subscription can generate repository credentials from a CDH license key, and there is a full description of how to do this in the Cloudera Enterprise 6.x Release Notes here: Version, Packaging, and Download Information.

azg · ‎04-29-2022

The latest version of Nifi is running (1.16.0) When I say the connection works fine in localhost, I mean that my Nfii service is launched via docker-compose on my computer. When I access Nifi via https://localhost:8443/nifi/ and use a ListenFTP processor on port 2221, the connection via FileZila works. i can transfer files and treat them in Nifi. Localhost : FileZila connection :

EmanuelArano · ‎04-26-2022

Thanks sr

ask_bill_brooks · ‎04-21-2022

@san_re What documentation are you following for what you are attempting to do here? You're much better off following a specific set of instructions from the site where you are downloading Mysql and/or NiFi from. For NiFi, the canonical instructions can be found here: NiFi System Administrator's Guide

ask_bill_brooks · ‎04-20-2022

Hi @Data1701 According to the API documentation, one can get a java.net.URISyntaxException when a passed string could not be parsed as a URI reference. The file you are attempting to read in might very well be available on your local area network from a shared server drive, but it isn't available via a valid URI, or at the very least, the URI you are referencing in your Spark code isn't a valid and accessible URI. What your problem boils down to is that the file isn't available via a web server, and the server that is running your Spark code can't retrieve it at the time your code executes. And that should shed light on why you had to previously upload your .csv files into CDSW, because that was the way to ensure that they could be found at runtime, since they were in a well-known/accessible location. There are several valid approaches to addressing this, but the easiest solution, if you want to continue to use the code snippet you've written and shared here, is to place the file on some server that is accessible over the web (preferably via HTTPS) and refer to it using a fully-qualified URL. In order to do that, a functioning and secured web server will have to be available to you (you could set this up on your local workstation). Let's assume you place the file on a web-accessible server somewhere local to your corporate network and the web-accessible directory path you place the file in turns out to be something like Data1701/project/data_folder/. Then you can change the assignment statement in your Spark code to this: df = spark.read.format('csv').load('https://web.dept.yourcompany.com/Data1701/project/data_folder/file.csv', header=True) …and the rest of your code should work, unchanged.

buzzamus · ‎04-13-2022

@ask_bill_brooks thanks for the information. Everything you said made sense. I will wait and see if anybody else has had better luck than I, but I think you are correct.

Member Since	‎07-29-2019 03:29 PM
Last Visited
Posts	640
Kudos received	108

Cloudera Community

Re: Vulnerability (Text4Shell) (CVE-2022-42889)

Re: ERROR orm.CompilationManager: Sqoop requires a...

Re: How to enable TEZ UI on CDP 7.1.7

Re: CDH HIVE download

Re: Nifi registry architecture.

Re: spark's HikariCP-2.5.1.jar is eclipsing the de...

Re: CDP private cloud experience cluster requireme...

Re: Download Cloudera Quickstarts

Re: Missing template for Cloudera ansible definiti...

Re: download CDH 5.10

Re: Nifi: Failed to retrieve directory listing whe...

Re: Install cloudera in redhat 8.5

Re: Not able to Connect NIFI to mysql through wind...

Re: How to read in a csv file from server location...

Re: Hive ODBC driver on m1 Mac