Member since
07-29-2019
603
Posts
104
Kudos Received
40
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
106 | 04-26-2022 01:59 PM | |
56 | 04-20-2022 03:55 PM | |
290 | 04-13-2022 11:33 PM | |
55 | 04-11-2022 12:00 AM | |
88 | 02-13-2022 02:29 PM |
05-24-2022
10:19 AM
Previously asked and answered in this thread:
log4j2 vulnerability (CVE-2021-44228)
... View more
05-20-2022
04:55 PM
Hi @Zinway
Assuming that you have a valid Cloudera subscription, you should be able to log into your My Cloudera account and navigate to the support portal in order to get in contact with Support and initiate a request for that specific .tar.gz file.
... View more
05-20-2022
04:45 PM
1 Kudo
@Vijay11
I cannot offer you any assurances that Cloudera maintains a repository for Ambari 2.7.6. I say that because, while Cloudera continues to support customers using Ambari in accordance with its associated products' support lifecycle, Cloudera announced it was ending its involvement in the Ambari open source community in 2021. The part of that announcement that is relevant to your questions is this paragraph:
Please note that Cloudera does not support direct installation of Apache Ambari 2.7.6. Although 2.7.6 contains numerous patches provided by Cloudera (some of which were delivered separately as hotfix releases to customers); the 2.7.6 release is not itself a Cloudera release.
You should carefully read the whole thing.
As far as how you can get paid access to access the private repositories and the costs for a Cloudera subscription, the only way to obtain answers to those questions is to get in touch with the Cloudera sales team. Once you make contact with someone there, be sure to tell them that you're looking to gain access to a repository, if any, for Ambari 2.7.6.. Note that as of this writing, Cloudera recommends upgrading to the latest version of CDP Private Cloud Base as soon as possible, as it has a richer set of features, is up-to-date and remains under standard Cloudera support through August 2024.
... View more
05-20-2022
08:55 AM
Hi @DzBoris
May I ask why you are attempting to download CDH 5.11.1? And does your organization have a valid Cloudera subscription?
Cloudera Enterprise 5.11 became generally available in June of 2017. As you no doubt are aware, that was quite a while ago, especially in terms of "internet time". Cloudera Enterprise 5.11 reached it's end of support date in April 2020 (open that link and then expand the section labeled "Cloudera Enterprise products" underneath Current End of Support (EoS) Dates).
The current Enterprise Data Platform offered by Cloudera is Cloudera Data Platform (CDP), which in it's on-premises "form factor" is offered as CDP Private Cloud. CDP supersedes CDH as it is fairly up to date on all the included components, which is not the case with CDH 5.11.
As a general matter, the credentials to access the private repository where Cloudera is now distributing previous versions of CDH are not generally the same ones to access Cloudera's website or the Cloudera community. Employees of organizations with a valid Cloudera subscription can upgrade their versions of Cloudera Manager to a newer version that uses the modified URLs which can contain these credentials. You can get started reading about how to do this in the Cloudera Enterprise 5.x Release Notes here: Version, Packaging, and Download Information.
The use of credentials which are not tied to a valid Cloudera subscription is the most common cause of the HTTP 403 Forbidden error message you are encountering.
... View more
05-17-2022
02:43 PM
@joshtheflame
I just wanted to provide a bit more context. The partial page shot you've included above appears to show Cloudera Manager running against a CDH 6.1.0 cluster. CDH 6.1.0 was released in December of 2018. As you no doubt are aware, that was quite a while ago, especially in terms of "internet time". Hopefully you are aware that CDH 6.1.x has reached its End of Support (EoS). You can find the most recent official reminder of the previous announcement that Cloudera Enterprise 6.x reached End of Support (EoS) in 2021 here:
March 2021 Customer Advisory - 2: End of Support for Cloudera Products (CDH/CM 6.x & HDP 3.x).
Cloudera's lifecycle support policies are documented here:
https://www.cloudera.com/legal/policies/support-lifecycle-policy.html
My understanding is that organizations with a valid Cloudera subscription for legacy products such as CDH would have been sent this announcement directly.
If that screenshot represents what your bank is running in production, I would recommend that you reach out to your Cloudera Account team and discuss your upgrade options as soon as possible. You are going to have to upgrade in order to take advantage of any of the offerings mentioned in @steven-matison 's reply earlier.
... View more
05-10-2022
05:10 PM
4 Kudos
I just want to address the assertions in the first paragraph above.
I don't think it's quite correct to deem HDF "a frozen project". it would be more accurate to say that, as a Cloudera product, it was superseded by Cloudera DataFlow (CDF) in the Winter of 2019. The legacy product HDF has already reached it's End of Support (EoS) date in March 2022, so it is safe to assume that no new major updates with new versions of NiFi will be going into it.
There'a always some differences between the versions released by Cloudera in products such as CDF and the release of "upstream" component projects such as Apache NiFi. This is analogous to how there are differences between what mainline kernel is "current" in the open source Linux world and what RedHat, for example, releases as part of Red Hat Enterprise Linux.
Right now, Apache NiFi is a component in a product called Cloudera Flow Management (CFM). My understanding is that CFM 2.1.4 will be based on NiFi 1.16 when it becomes available a little later this year. To find out more about that release schedule, reach out to your Cloudera Account Team. I haven't personally tried doing it, but the documentation for CFM 2.1.1, the currently-released version, indicates that you can install CFM on top of HDF Version 3.5.2, so when the new version comes out, you might not be out of luck if you want to run NiFi version 1.16.x on top of that version of HDF, if CFM maintains that support (I can't offer any assurances that it will, however, given that HDF has reached EoS already).
Now that brings us to Ambari. While Cloudera continues to support customers using Ambari in accordance with its associated products' support lifecycle, including HDF, Cloudera announced it was ending its involvement in the Ambari open source community in 2021. You are correct that at the ASF, Ambari was terminated in January of this year. That means that going forward, there are going to be fewer people using it, and fewer people who have the knowledge of how it works in order to effectively support it. This is one reason that as of this writing, Cloudera recommends upgrading to the latest version of CDP Private Cloud Base as soon as possible, as it has a richer set of features and is the successor to HDF and remains under standard Cloudera support through August 2024.
We welcome your questions, and this thread will remain visible here in the hope that some other member of the Cloudera Community will reply with assistance addressing your second paragraph above.
... View more
05-06-2022
12:55 AM
Hi @sparkdeveloper
You are encountering the well-known phenomenon of Java Class shadowing, and it really doesn't have all that much to do with Spark, or the spark-submit utility. One direct solution is to employ a technique called "shading" the dependencies, which usually winds up being a lot of work, and I personally avoid doing it. If you want to go that route, you can read all about that technique on any number of websites that cover Java software development or in books on the same topic.
Obviously, I don't know what the code you're running that uses this library does, but if you're looking for some way to resolve this quickly, I think you are better off rewriting your code that uses the 3.2.0 version of this library to use the APIs in the version that comes pre-packaged with your spark installation and just omit adding your .jar file to the CLASSPATH. You'll probably need to download the HikariCP-2.5.1.jar file and install it where you're doing your development work. I realize that is a lazy kind of answer, but you should consider it unless you really have a compelling need to use the later version.
... View more
05-06-2022
12:22 AM
@aval
I don't personally have the required expertise (yet) to answer your question, but I did want to attempt to clarify the question for other community members who do.
In paragraph 2, you write:
We want to setup the data experiences so we can move some work between the cdp public cloud and the cdp private cloud experiences since the control planes are similar.
I think, based on the rest of your question, that you intended to write "…so we can move some work between cdp private cloud base and the cdp private cloud experiences since the control planes are similar.". It's important to be clear about this because Cloudera does have a product called CDP public cloud, but that "form factor" of CDP only works on the infrastructure provided by the so-called hyperscalers, or CSPs such as AWS, Azure and GCP. The on-premises offering is called CDP Private Cloud. Please update this thread with a new, clarifying post if you really want to know how to move workloads between cdp public cloud and cdp private cloud experiences clusters.
... View more
05-03-2022
08:25 AM
Hi @Freschone
May I ask why you need to download Quickstart VM based on CDH 5.10? Is this a classroom assignment?
As a general matter, Cloudera is no longer updating or making the Cloudera Quickstart VM available for download (and hasn't since March of 2020) because it was outdated and obsolete as the last version was based on CDH 5.13, which went out of support in the Fall of 2020.
The credentials to access the private repository where Cloudera is now distributing previous versions of CDH are not are not generally the same ones to access Cloudera's website or the Cloudera community. Employees of organizations with a valid Cloudera subscription can generate repository credentials from a CDH license key, and there is a full description of how to do this in the Cloudera Enterprise 6.x Release Notes here: Version, Packaging, and Download Information.
... View more
05-01-2022
04:49 PM
1 Kudo
Hi @Yaniv
The link to the GiHub repo that @GangWar shared earlier in the thread is the canonical location for assets related to Cloudera's Ansible-based automation for the deployment of CDP Private Cloud Base. As it says in the blog post referred to in this thread earlier:
The Ansible playbooks are provided on an as-is basis without any warranty or support.
If you can't find a desired asset there, that means it has not been publicly released by Cloudera. To put it more directly, as of April 2022, Cloudera has not yet publicly published a definition that includes HA.
If you go ahead and develop an HA template on your own, you are always welcome to raise an issue on the GitHub project and submit your in-house developed template as a pull request, as Cloudera welcomes participation from the community.
If you don't have the Ansible playbook development skills or the time to obtain them, I would suggest you engage Professional Services to develop the assets you need. If you're a Cloudera Subscription Support customer, please do reach out to your Account team to discuss your potential project if you wish to go that route.
... View more
04-28-2022
12:38 PM
1 Kudo
Hi @azg ,
Do you know what version of NiFi is included in the docker image you're running?
And when you say "the connection works well on localhost", can you expand on what exactly you did to confirm that the connection "works"?
... View more
04-26-2022
01:59 PM
Hi @EmanuelArano
Keeping abreast of the fast moving and ever-updating collection of requirements in terms of supported operating systems, Database Management Systems, Java Development Kits and/or primary processor architectures compatible for use with Cloudera Data Platform (CDP) is quite a challenge and I doubt any members of the Cloudera Community keeps track of all of that in their head. Luckily, you don't have to.
I recommend you refer to the section subheaded CDP Private Cloud Base Supported Operating Systems
…in the documentation for the specific release you're interested in (you didn't say which specific version of CDP Private Cloud Base you want to install, and there are as of this writing eight different "point releases" of CDP Private Cloud Base 7.1). As a point of reference, CDP Private Cloud Base versions 7.1.2 and 7.1.3 were released in the Fall of 2020, so hopefully you are not attempting to install 7.1.0 at this point.
In particular, that section features a hyperlink to the very handy Cloudera Support Matrix, on which you can click on the product (in your case, Cloudera Manager and CDP Private Cloud Base) to see all the product versions that it supports. To narrow down your search for supported combinations, click again on the supported product versions that are highlighted in green. You can then scroll down to see the supported Operating Systems (along with Databases and JDKs).
Should you find, after consulting that documentation, that "redhat 8.5" is not supported, and you have full backups available to you, the best approach would be to restore your system to the state it was in prior to the yum update and proceed with your installation using 8.4 as the operating system. If for some reason you don't have full backups available, you might want to explore using the yum history command to roll back that last OS update.
... View more
04-21-2022
09:53 AM
@san_re What documentation are you following for what you are attempting to do here? You're much better off following a specific set of instructions from the site where you are downloading Mysql and/or NiFi from.
For NiFi, the canonical instructions can be found here:
NiFi System Administrator's Guide
... View more
04-20-2022
03:55 PM
Hi @Data1701
According to the API documentation, one can get a java.net.URISyntaxException when a passed string could not be parsed as a URI reference.
The file you are attempting to read in might very well be available on your local area network from a shared server drive, but it isn't available via a valid URI, or at the very least, the URI you are referencing in your Spark code isn't a valid and accessible URI.
What your problem boils down to is that the file isn't available via a web server, and the server that is running your Spark code can't retrieve it at the time your code executes. And that should shed light on why you had to previously upload your .csv files into CDSW, because that was the way to ensure that they could be found at runtime, since they were in a well-known/accessible location.
There are several valid approaches to addressing this, but the easiest solution, if you want to continue to use the code snippet you've written and shared here, is to place the file on some server that is accessible over the web (preferably via HTTPS) and refer to it using a fully-qualified URL. In order to do that, a functioning and secured web server will have to be available to you (you could set this up on your local workstation).
Let's assume you place the file on a web-accessible server somewhere local to your corporate network and the web-accessible directory path you place the file in turns out to be something like Data1701/project/data_folder/. Then you can change the assignment statement in your Spark code to this:
df = spark.read.format('csv').load('https://web.dept.yourcompany.com/Data1701/project/data_folder/file.csv', header=True)
…and the rest of your code should work, unchanged.
... View more
04-13-2022
11:58 PM
1 Kudo
Hi @san_re
The first error message you included is likely the result of not enabling connection over the network (or locally, as the case may be) to the Mysql server. You have to take some administrative action on a newly-installed Mysql server in order to allow outside applications (in this case, that would be the local NiFi) to connect to the server. It's difficult to troubleshoot that remotely because it could be something else, but in my experience that is the most common root cause.
This error message:
Driver class com.mysql.jdbc.driver is not found - Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.driver: Driver class com.mysql.jdbc.driver is not found
Typically means that the JDBC driver is not being found on the relevant CLASSPATH, which is in this case is NiFi. After you ensure you haven't mistyped the name of the java Driver class, you need to make sure you installed the appropriate .jar file in the location where NiFi is expecting to find it, and that the file is not corrupted, etc.
... View more
04-13-2022
11:33 PM
1 Kudo
Hi @Saraali Thank you for asking a great question! Allow me to expand a bit on the answer posted earlier by @Azhar_Shaikh.
He's correct that you could write a Python script leveraging the Pandas API to programmatically create an MS Excel file, and then call that script in NiFi using ExecuteStreamCommand, although perhaps using ExecuteScript might be a better candidate, depending on how your overall flow is designed and what external software you feel like installing or configuring.
There's a reasonably well-documented set of classes/methods in the Pandas API that would allow you to, once you have the data from your .csv file read in, convert the data to a Pandas DataFrame and then write the DataFrame to an Excel file. If your software development skills are limited to Python, that would be a workable approach.
My reading of your question, however, was that you were asking about writing a custom processor, not invoking a script. If you are not limited to Python like the original poster in the above-referenced Stack Overflow thread, you should consider writing a full-on NiFi processor in Java and leverage libraries such as the Apache POI library or The JExcel library. You can use either library to programmatically read, write and modify the content of an Excel spreadsheet from a Java program, but the later library only provides support for processing Excel files in the .xls (1997-2003) format. This approach requires some significant software development skills, because it doesn't involve just Java programming but a certain amount of familiarity with the associated tools, principally Maven. Telling you how to do that would involve a substantial, article-length tutorial. I still recommend Andy LoPresto's conference session from the 2019 DataWorks Summit Conference, Custom Processor Development with Apache NiFi to folks new to NiFi processor development that want to get an overview of what's involved.
If you don't have those software development skills or the time to obtain them, I would suggest you engage Professional Services to develop the processor you need. If you're a Cloudera Subscription Support customer, we can connect you with your Account team to discuss your potential project. Let me know if you are interested in this path by using the community's private message functionality to transmit your contact information.
This thread will remain open so other community members with greater expertise with custom NiFi processor development can contribute, if they so desire.
... View more
04-13-2022
11:33 AM
1 Kudo
@buzzamus To be completely honest, I have not tried what you're attempting, but based solely on logical deduction, I don't think you are going to get this to work. To understand why I say that, take a look at the section headed macOS System Requirements in the document Cloudera ODBC Driver for Apache Hive. That document is for 2.6.1 of the Driver (in other words, more up-to-date than the 2.5.0.x version you are running), and it explicitly says that it was written for macOS version 10.11, 10.12, or 10.13. You're running macOS Monterey (12.1) and using a CPU that came out significantly later than the one this code was designed to run on, and I think it's safe to assume that the binaries you are trying to install are not Universal 2 app binaries that work on Intel Macs and Macs based on Apple silicon.
I can't speak authoritatively on whether or not this driver is currently supported for ARM based processors.
This thread will remain open so other community members with greater expertise with macOS and ODBC can weigh in, if they so desire
... View more
04-13-2022
06:21 AM
@serg93 Good to hear that you have a valid Cloudera subscription. The absolutely correct approach is still to reach out to your Cloudera Account representative, and they can get you the access you need.
... View more
04-12-2022
12:49 PM
Hi @serg93
Assuming you have a valid Cloudera subscription, reach out to your Cloudera Account representative, and they can get you the access you need.
... View more
04-11-2022
01:59 PM
Hi @buzzamus
I think it would be helpful to community members inclined to answer your question if you ran the following two commands at the command line and posted the output in a reply, here in this thread.
Open up a terminal window, and at the shell prompt, issue this command, to retrieve the installed Mac OS X operating system version:
urmachine:~ usrname$ sw_vers
…and then, at the same prompt, issue this command, to retrieve the version number and associated information about the Cloudera ODBC Driver for Apache Hive that you've installed on your machine.
urmachine:~ usrname$ /usr/sbin/pkgutil --info cloudera.hiveodbc
... View more
04-11-2022
12:00 AM
Hi @fatalprocess
You write that you "have literally used the VM out of the box", but you didn't indicate what "VM" it was, where you retrieved it from, whose instructions you followed to install it (from Codementor?) or configure it or what version you installed. You also didn't indicate which virtualization platform you're using. All this makes it quite difficult to troubleshoot your problem remotely.
Judging from the web page shots you posted, I will assume that you are intending to use the HDP Sandbox, but it's still not clear which version you installed. The most recent version of the HDP Sandbox was based on HDP 3.0.1.0, which dates from November 2018. It would probably be a better approach to "start over", download that latest version and carefully follow the Sandbox Deployment and Install Guide and the accompanying tutorial Learning the Ropes of the HDP Sandbox.
I should mention, though, that the HDP Sandbox is based on legacy products that no longer represent Cloudera's current product offerings (which is why the two web pages I referred to above are difficult to land on, directly). Since you say that you don't know what you are doing and have not been trained properly, if you're able to choose I strongly recommend simply abandoning your work with the HDP Sandbox and proceeding to use and train yourself on Cloudera Data Platform, which in it's on-premises "form factor" can be installed as CDP Private Cloud. CDP supersedes HDP as Cloudera's Enterprise Data Cloud offering.
There is extensive and thorough documentation on installing the CDP Private Cloud Base Edition of Cloudera Data Platform using on-premises hardware and Cloudera also has an extensive tutorial on installing a CDP Private Cloud Base (trial version) leveraging AWS in an infrastructure as a service (IaaS) fashion which you should consider.
... View more
04-04-2022
06:48 AM
@hbenner89 ,
In addition to what @ChethanYM wrote above, you should also share the file size that you are attempting to upload.
As a general matter, you can't expect a web browser to enable you to upload a file with arbitrarily large size, so the perhaps unstated reason the Jira issue you pointed to was resolved with the status "won't fix" is because this is not a limitation that is specific to Hue.
... View more
04-01-2022
05:42 PM
Hi @JoseCosio
You didn't say where you were attempting to download the Cloudera Quickstart from or what site was denying your access, or what version you were attempting to access. May I ask why you were attempting to download a Cloudera Quickstart? Is this a classroom assignment?
As a general matter, Cloudera is no longer updating or making the Cloudera Quickstart VM available for download (and hasn't since March of 2020) because it was outdated and obsolete as it was based on CDH 5.13, which went out of support in the Fall of 2020.
I'm curious as to why anyone would ask you to download a Cloudera Quickstart VM at this point in time, or why you are interested in a data platform distribution which does not include the up-to-date releases of the various Hadoop ecosystem components. Cloudera's current distribution, since the Fall of 2020, is Cloudera Data Platform (or CDP); a Trial Version of CDP Private Cloud Base Edition of Cloudera Data Platform can easily be downloaded and installed from the "downloads" section of Cloudera's website.
... View more
03-31-2022
09:26 PM
Hi @humberto5213
It would be helpful to community members inclined to answer your question if you included a link to what set of instructions you are following to "install everything into a docker container." What constitutes "everything"? What data platform are you working with? The specific version of Docker (Docker Desktop for Mac?) you're using would be helpful, as well.
... View more
03-28-2022
11:43 AM
Hi @hooneybadger
It appears from the screen shot you've provided that you're using Hue in the Cloudera Quickstart VM on top of Oracle VirtualBox. Can you tell us a bit more about why you're attempting to use the Cloudera Quickstart VM? Can you specify which set of instructions you are following or which Hue tutorial is guiding you?
Cloudera is no longer updating or distributing the the CDH 5-based Quickstart for VirtualBox (or any other virtualization platform, for that matter) because it was old and outdated. The last version was based on CDH 5.13, which went out of support in October 2020. Your screen shot indicates that you're using an even older version, version 5.12.
As of this writing, I don't believe that there are many people still running the Cloudera Quickstart VM, and for that reason, I think it's unlikely that anyone will be able to help you with the specific error you're encountering while attempting to create your desired dashboard. While we welcome your question, and it will remain visible here in the hope that some member of the Cloudera Community will answer it, you would be much more likely to obtain a suitable solution if you focus your efforts around a more up-to-date data platform.
The new, updated data platform from Cloudera for on-premises use is Private Cloud Base Edition of Cloudera Data Platform. There is extensive and thorough documentation on installing the CDP Private Cloud Base Edition of Cloudera Data Platform (the "free trial" for which is available via Cloudera's web site) on a non-production environment for demonstration and proof-of-concept use cases. That version does not come "pre-packaged" for deployment on a VM platform such as VirtualBox, however.
One alternative you might consider if you're already familiar with Vagrant, is @carrossoni's community article outlining how to create a Centos7 CDP Trial VM for sandbox/learning purposes, which is intended for use on VirtualBox.
... View more
03-26-2022
09:26 AM
@Reema
You didn't indicate whether you're planning to use NiFi using an on-premises installation or at at cloud services provider, but in general, yes someone with a sufficient understanding of how to set up NiFi should be able to migrate data from one RDBMS, such as DB2, to another RDBMS, such as SQL Server with a bit of effort. You say you don't know anything about NiFi, so you probably won't be able to achieve your goals without some preliminary learning of the fundamentals on your part, but it will prove to be well worth it.
There are a wealth of resources on the Internet for learning NiFi. I'd personally recommend that you start by downloading the e-book Apache NiFi for Dummies and reading it. Then, you can dive deep by viewing the recording of Apache Nifi Crash Course from the Spring of 2018.
If that's not enough, you can review the links offered in response to a previous time when the second part of your question was asked and answered on the cloudera community here:
How to learn NiFi practically for beginners…Please send me any books or links of blogs.
... View more
03-16-2022
10:00 AM
Hi @AnasF
Whose course on big data are you taking? Can you supply a link here to the webpage describing the course and who is offering it?
When you write that the course is "using Cloudera", what do you mean? Did the instructor give you specific directions on downloading a specific software package and if so, which one? Can you share a link to a site describing what specific software the course is using? What operating system did you install said software on, and whose instructions did you follow to install it?
What command are you attempting to use and why? The bash shell has a built-in command named 'command', so the message you included above is probably not enough for any members of the Cloudera community inclined to answer your question to offer a helpful response.
... View more
03-14-2022
11:30 AM
2 Kudos
@BigDataAvengers can you tell us a bit about why you're installing this six year old image? Does what you've already installed work with acceptable responsiveness? You didn't say:
where you retrieved the docker quickstart image from
how much RAM your mac m1 has available
whether or not you were actually able to successfully start up the services, beginning with the Cloudera Manager user interface, that were already present in the cloudera quickstart vm image you installed
The docker image you've installed appears to be based on CDH 5.7, which went out of support at the end of August in 2019 and is even older than the most recent version of the CDH Quickstart, which was based on CDH 5.13, which itself is already out of support. Cloudera Enterprise 5.13 reached it's end of support date in October 2020 (open that link and then expand the section labeled "Cloudera Enterprise products" underneath Current End of Support (EoS) Dates). For this reason, Cloudera is no longer distributing the CDH 5-based Quickstart. It is also why this hyperlink:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cloudera_quickstart_vm.html
…to the appropriate documentation results in an HTTP 404 error.
And Kafka was never bundled and delivered with the cloudera quickstart; it always required a separate installation.
Next, it's not at all certain that your M1 Mac has enough RAM available in total to be able to run the services you are attempting to run within the container (or containers) without consuming so much of the host machine's memory that OS X starts killing processes to free up memory, and Docker is very much one of the processes subject to being killed. IIRC the memory recommendation for the Docker version of the Quickstart VM itself was about 10 GB, so on a 16GB Mac, with the overhead for Docker and other running applications (such as a web browser), things will be running pretty tight as far as memory goes. Even if you were able to install the docker image that @ckumar pointed you to in another container alongside the quickstart's container, that does not mean that it would provide a useable system, because the HDF NiFi container called for at least 8 GB.
So while it is possible that someone with the requisite knowledge of the Linux command line and Docker combined with sufficient skill and abilities with the various required development tools could update the last, outdated cloudera quickstart image to add Kudu and then subsequently figure out how to run it alongside containers for some version of NiFi and Kafka (and indeed, some member of the Cloudera Community may have already done so and be willing to share their method in response to your question), it doesn't mean you'd have the necessary hardware resources to run it on your M1 Mac, or that it would produce acceptable results.
You should ask yourself whether or not this would be worthwhile.
The current Enterprise Data Platform offered by Cloudera as of this writing is Cloudera Data Platform (CDP), which in it's on-premises "form factor" is now called CDP Private Cloud. CDP supersedes CDH and the components are at the appropriate versions that enable them to work together.
If you're just looking to evaluate a data platform, you can currently do so without a Cloudera subscription by downloading and installing the Trial Version of CDP Private Cloud Base Edition of Cloudera Data Platform. Cloudera has an extensive tutorial on installing CDP Private Cloud Base in an Infrastructure as a service (IaaS) fashion using AWS on its website. This approach allows you to leverage machines which have hardware resources, such as plentiful RAM, that your laptop might not have available. And CDP Private Cloud Base Edition ships with Kudu.
... View more
02-23-2022
08:22 AM
Hi @BalajiS
From the photo you posted, it appears that you are attempting to install version 3.0 of the HDP Sandbox. It would be helpful to those community members inclined to answer your question if you included:
What set of instructions or tutorial you are following to complete the installation
What host OS Version you are attempting to install the sandbox on
What version of Virtualbox you're using
How much memory you've allocated to the virtual machine
Another approach would be to skip installing the HDP 3-based Sandbox because HDP is no longer current. If you're just looking to evaluate a data platform, you can currently do so without a Cloudera subscription by downloading and installing the Trial Version of CDP Private Cloud Base Edition of Cloudera Data Platform. Cloudera has an extensive tutorial on installing CDP Private Cloud Base in an Infrastructure as a service (IaaS) fashion using AWS on its website.
... View more
02-22-2022
11:17 AM
1 Kudo
Hi @nivanecu
You didn't say what set of instructions you were following in order to do the installation, so this is going to be somewhat difficult to troubleshoot…but from what I've gathered from the output you've provided, you're attempting to install the HDP Sandbox. It seems like the problem here is not any issue with the HDP Sandbox, but with your use of Docker. Just picking up on the error message you've emphasized here:
docker: Error response from daemon: Conflict. The container name "/sandbox-hdp" is already in use by container "b1c48478c7d48e681a706f86de84fd23978be91001aef9efc6da18a24f99c21f".
It appears that you have a pre-existing container that is causing a conflict. As a first stab at a solution, you could try removing that other container.
First, you're going to want to get a listing of all the containers you have. Issue the command docker ps -a at the command line. The output will give you a list of all the containers you have in use and their status. Confirm for yourself that the container identified by the error message is not in the 'running' status.
Then you can remove the container by issuing a command something like:
MBP15:~ ask_bill_brooks $ docker container rm -v \ b1c48478c7d48e681a706f86de84fd23978be91001aef9efc6da18a24f99c21f
Which will allow you to reuse the name that is causing the conflict you're encountering.
... View more