About ask_bill_brooks

ask_bill_brooks · ‎07-15-2021

Hi @miguelbda21 When I click the first hyperlink you provided, I get an HTTP 404 error. I do not see the text you quoted: Downloading a Cloudera QuickStart VM Cloudera QuickStart VMs are available as Zip archives in VMware, KVM, and VirtualBox formats. Cloudera recommends that you use 7-Zip to extract these files, when possible. (7-Zip performs well with large files.) To download the latest VM in the required format, see Cloudera QuickStart VM Download. …anywhere on that page, or indeed on any Cloudera downloads page. Yes, Cloudera is no longer making the Cloudera Quickstart VM for VirtualBox (or any other virtualization platform, for that matter) available for download because it was old and outdated as it was based on CDH 5.13, which went out of support in the Fall of last year. The new, updated distribution from Cloudera for on-premises use is Private Cloud Base Edition of Cloudera Data Platform. There is extensive and thorough documentation on installing the CDP Private Cloud Base Edition of Cloudera Data Platform (the "free trial" for which is available at the second hyperlink you provided) on a non-production environment for demonstration and proof-of-concept use cases. That version does not come "pre-packaged" for deployment on a VM platform such as VirtualBox, however. One alternative you might consider if you're already familiar with Vagrant, is @carrossoni's community article outlining how to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes, which is intended for use on VirtualBox.

ask_bill_brooks · ‎07-13-2021

@chams221 The more detail you provide the better community members can assist with your question. Just looking at the screen capture you posted, it appears that you are working with the HDP Sandbox, not a full-scale installation of HDP on a dedicated cluster. You should also post the specific virtualization platform you're using. Perhaps another member of the community will recognize this specific problem with the Sandbox and respond in this thread. You didn't indicate what version of the HDP Sandbox you installed. If you don't know how to retrieve the version of the Sandbox you've installed, please read through this tutorial: Learning the Ropes of the HDP Sandbox …and scroll down to the subsection Sandbox Version and follow the instructions there. You didn't indicate how you installed NiFi, what version you installed or how you determined that the installation was successful. If you didn't install a version and distribution that was designed to work on the HDP Sandbox, your efforts are unlikely to be successful if you are a beginner. Assuming you are fairly new to NiFi, I'd recommend that you start over and you install an appropriate version of the Cloudera DataFlow (CDF) Sandbox for the HDP Sandbox you are using.

ask_bill_brooks · ‎07-13-2021

@JatinMalik You can find an answer to your question here: Re: How to download Ambari-repo

ask_bill_brooks · ‎07-08-2021

@rok Perhaps. I think the place to begin would be to look at this: STACKTRACE=[java.io.IOException: Server returned HTTP response code: 403 for URL: http://cloudera_manager_ip:7180/cmf/j_spring_security_check …and figure out why the access request to that URL is being denied with an HTTP 403 Forbidden client error. Is that URL accessing a host on your network, or the host running Cloudera Manager?

ask_bill_brooks · ‎07-07-2021

Hi @Gcima009 It's not readily apparent to me just from reading the Traceback, but if you're sure that the problem is due to Cloudera Manager not being able to access the URL https://archive.cloudera.com/cm6/6.2.0/allkeys.asc/ then you are probably being denied access to the server archive.cloudera.com because authentication is required. I am guessing that the last time you attempted this operation, you were not challenged for authentication by this particular host at Cloudera and so you're wondering what changed recently. The answer is probably that your installation of Cloudera Manager isn't set up to supply the authentication credentials. Earlier this year, Cloudera changed the download policy and now to download CDH parcels, etc. from Cloudera's repositories you need a valid subscription. Please see the announcement here: Transition to private repositories for CDH, HDP and HDF. The same announcement describes new patch releases of Cloudera Manager, which are now required to access Cloudera’s private repositories, which now contain the new and legacy releases and other assets such as those necessary to add a new host to an existing CDH cluster. If you haven't already done so, you'll need to upgrade to one of these in order to proceed.

ask_bill_brooks · ‎07-05-2021

Hi @tja Other community members may weigh in with their opinions, but I believe the answer to the first and last questions is, of course, "it depends on the job". The most suitable use cases for Sqoop center on bulk structured data transfer between RDBMSs and HDFS. Sqoop takes the commands you provide at the CLI and internally generates MapReduce tasks to execute your desired data movement, with HDFS as either the source or destination. While you can do some of the same types of (simple) things with either Spark or Sqoop, Spark and Sqoop are not interchangeable tools. You can do a lot more with Spark than you can with Sqoop because Spark gives you a full-blown programming language (Scala) along with a set of libraries that support a fairly complete distributed processing framework. The "T" part of ETL is going to be a lot easier to tackle in Spark than by using Sqoop and you will probably encounter tasks that are nearly impossible to complete with Sqoop that are fairly straightforward to address in Spark code, assuming that you have the requisite software development background. While I have not done any performance comparisons between a batch job in Sqoop to an equivalent job written in Spark (and I haven't read anybody else's work on that topic), just engaging in a bit of logical deduction from first principles would lead me to expect a considerable performance advantage in using Spark over Sqoop import (given a sufficiently large data set) because in Spark you can leverage in-memory processing capabilities which should, in theory, out perform MapReduce. Yes, unfortunately the Sqoop PMC voted in June to retire Sqoop and move the responsibility for its oversight to the Attic. That does not mean that Apache Sqoop as a tool has lost all value. Cloudera still ships it as part of Cloudera Runtime, still fully supports Sqoop and responds to new feature requests coming from customers, and there's no plan to change this. The change in status at Apache could mean that the software has reached maturity "as is" and still has its uses. But end-user development of a complete new ETL pipeline is probably not one of them.

ask_bill_brooks · ‎07-05-2021

@Rbcc What you should do next is post the specific version of the sandbox you are running in this thread so that members of the community who are inclined to do so can address your question. You should also post the specific virtualization platform you're using, as your question mentioned both virtual box and docker; I have to assume you are only actually using one or the other. If you don't know how to obtain the sandbox version, see this tutorial: Learning the Ropes of the HDP Sandbox …and scroll down to the subsection Sandbox Version and follow the instructions there.

ask_bill_brooks · ‎07-04-2021

@jiyo The explanation provided earlier by @Shelton isn't wrong, but I thought I would follow-up and provide some context I think would be helpful. In your original question, you wrote that you "got an email saying that the username is my Google email address." This email likely was referring to the username you would use to log into the Cloudera Community, not the username you would use to access Cloudera's private repositories where the binaries for Cloudera's distributions of Hadoop and/or Ambari are now located. As he pointed out and hopefully you are now aware, Cloudera modified its download policies and the binaries you are seeking to download are now only available in a private repository. If not, please see the announcement here: Transition to private repositories for CDH, HDP and HDF. The credentials to access this private repository are not generally the same ones to access Cloudera's website or the Cloudera community. The same announcement describes new patch releases of Ambari which are required to access Cloudera’s private repositories, which now contain these new and existing releases. The reason you're getting the error message from the invocation of the wget command is that the credentials to use to access the aforementioned private repositories does not depend upon your Google email address, and the command is never actually getting to the point of accessing the host archive.cloudera.com. The HTTP 301 redirects you're seeing are responses from one of google's web servers, and not Cloudera's.

ask_bill_brooks · ‎07-04-2021

Hi @roshanbi I think there are really two questions here: For each row of my data set, can I mask the last 5 digits of each data element present in the pri_identity column using Ranger? Is this possible to achieve while using Kudu? I'll restrict myself to addressing the first question. Your second question is a good one, though, because most of the documentation I've read about this simply doesn't mention Kudu, so I'll leave that part of your question to another community member who has more experience with Apache Kudu as a storage option. You didn't provide the version of either Impala, Ranger or Kudu you're using or on what distribution, but I will attempt to point you in the right direction nonetheless. You can see a quick demonstration of why and how to use a mask in Ranger on CDP in the first two minutes of this video: How to use Column Masking and Row Filtering in CDP You can see a slightly longer length demonstration of how to do something similar on HDP 3.1.x in this video: How to mask Hive columns using Atlas tags and Ranger Neither quite shows how to establish the custom masking expression, though, which is what I think you'll need to satisfy your requirements. To suppress the display of the last 5 digits in the pri_identity column, you are likely to need a custom masking expression for use in Ranger. Ranger includes several "out of the box" masking types, but a cursory look at the documentation indicates that the masking policy you've described and desire is not one of them. If that's true, you can always write a custom masking expression using the UDF syntax, which you can read about at the Apache.org site here: Hive Operators and User-Defined Functions (UDFs) Hope this helps

ask_bill_brooks · ‎06-25-2021

What the Trial Version of CDP Private Cloud Base Edition includes is an installer package. You can view the documentation on how to complete the installation here: INSTALLING CDP PRIVATE CLOUD BASE Other members of this community have previously reported success using this approach. Alternatively, if you're already familiar with Virtualbox and Vagrant, you might consider closely reading @carrossoni 's community article outlining how to create a Centos7 CDP-DC Trial VM for sandbox/learning purposes (CDP Private Cloud was formerly known as CDP Data Center). Other than that, I have not personally seen any publicly available VM images that can be deployed using Virtualization tools, but that certainly doesn't mean they don't exist. Cloudera's distributions have a vast ecosystem built up around them and it's close to impossible to keep up with everything anyone is producing. But participating in and contributing to this community helps a lot. 😀

Member Since	‎07-29-2019 03:29 PM
Last Visited
Posts	640
Kudos received	109

Cloudera Community

Re: Vulnerability (Text4Shell) (CVE-2022-42889)

Re: ERROR orm.CompilationManager: Sqoop requires a...

Re: How to enable TEZ UI on CDP 7.1.7

Re: CDH HIVE download

Re: Nifi registry architecture.

Re: Old Cloudera VirtualBox

Re: nifi don't see UI

Re: How to download ambari.repo?

Re: Host monitor fails to start because of j_spri...

Re: Error: https://archive.cloudera.com/cm6/6.2.0/...

Re: Using Sqoop to import data from SQL server

Re: problems log into Ambari dashboard: unable to ...

Re: username gives error in wget

Re: mask fields

Re: Easiest way to do a Cloudera Demo