About Sean

Sean · ‎04-28-2015

The reason is that CDH is installed in the VM using Linux packages, not parcels (so that using Cloudera Manager to manage the services is optional). If you'd like to install the Kafka parcel, you'll first need to move CDH to a parcel-based install. The documentation to do this can be found here: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_migrating_packages_to_parcels.html.

Sean · ‎03-24-2015

Can you confirm in Cloudera Manager that the HDFS service is running and healthy? If the service is marked in any color other than green, there should be a little warning icon that you can click on to get any information about what may be wrong. If the service is healthy, can you tell me what happens when you run "hadoop fs -ls /user/examples/sqoop_import_order_items.avsc" from the command line on a machine in your cluster?

Sean · ‎03-17-2015

I'm afraid I'm not very familiar with R and running it against Hadoop. My first thought is that perhaps the program that creates the files and the program that looks for the files are running as different users? /user/cloudera is the default working directory for the cloudera user, but other users will default to other directories. e.g. if 'root' asks for a file called '0', unless there's an absolute path with it, it means /user/root/0. Is it possible these files exist under a different user's home directory?

Sean · ‎03-13-2015

I believe this procedure should get you switched over from YARN / MR2 to MR1. After running it I was able to comput pi using MR1: for service in mapreduce-historyserver yarn-nodemanager yarn-proxyserver yarn-resourcemanager; do sudo service hadoop-${service} stop sudo chkconfig hadoop-${service} off done sudo yum remove -y hadoop-conf-pseudo sudo yum install -y hadoop-0.20-conf-pseudo for service in 0.20-mapreduce-jobtracker 0.20-mapreduce-tasktracker; do sudo service hadoop-${service} start sudo chkconfig hadoop-${service} on done It stops and disables the MR2 / YARN services, swaps the configuration files, then starts and enables the MR1 services. Again, the tutorial is not written to be used (or tested) with with MR1, so it's possible you'll run into some other issues. I can't think if any specific incompatibilities - just recommending that if you want to walk through the tutorial, you do it with an environment as close to the original VM as possible - otherwise who knows what differences may be involved.

Sean · ‎03-13-2015

To answer Morgan's question, port 8020 is the HDFS NameNode, port 8021 is the JobTracker in MR1, which is where you would have submitted jobs in CDH 4. It can still be used in CDH 5, but as it is not the default, you'll need to switch around some configuration and services (and understand that the rest of the tutorial may not work exactly as expected because of the switch - I'd suggest perhaps starting with a fresh copy of the tutorial to be sure everything in the tutorial will work and not conflict with what you've been doing in R).

Sean · ‎03-13-2015

After reviewing the blog post, I noticed that it is written for the CDH 4.1.1 VM. I'm afraid there have been a number of changes since then that might be complicating things. The primary change, and the one that I think is complicating Sqoop for you, is the in CDH 4 we recommend MR1 for production, whereas in CDH 5 YARN has stabilized and we now recommend MR2 for production because of the superior resource management. I believe the following line is responsible for setting up your environment such that Sqoop is trying to use MR1 when it is not running: ln -s /etc/default/hadoop-0.20-mapreduce /etc/profile.d/hadoop.sh You could either try getting rid of that symlink and anything else that's telling the system to use MR1, or you could stop YARN / MR2 and use MR1 instead. I'll try post some instructions for doing the latter shortly...

Sean · ‎03-13-2015

>> In the very first tutorial on cloudera, it reads "You should first log in to the Master Node of your cluster using SSH - you can get the credentials using the instructions on Your Cloudera Cluster. " It's a little confusing whether you're running these commands on your host machine, or on the VM. If you're reading the tutorial hosted on a website somewhere, it's written with you running this on a fully-distributed cluster in mind and SSH'ing in to the machine. There's a modified copy hosted on the VM itself (just go to localhost in the web browser in the VM, or on your host as port-forwarding should work for VirtualBox) that (in my copy at least) just tells you to click on the terminal icon on the VM's desktop and enter commands there. Which version of the VM are you using and where do you see that text? It should be possible to SSH into the VM, and even run these commands from your host machine but doing so requires a lot of network configuration to be set up correctly - it won't be set up that way by default and it can be complicated to get it working consistently on different hosts - which is why I recommend just using the terminal on the VM's desktop. The root cause of your connection refused error problem appears to be that Sqoop is trying to use MR1. The VM is set up to use MR2 / YARN by default, so that is probably why MR1 is not running and you can't connect. Cloudera supports running both MR1 and MR2, but you can't have a machine configured as a client to both at the same time. When I run this on my copy of the VM (and in all recent versions) Sqoop is definitely using MR2 / YARN. Have you change any other configurations before running Sqoop? Is it possible you've got Sqoop installed on your host machine and it's configured differently than Sqoop in the VM?

Sean · ‎02-13-2015

I hear you - I prefer .tar.gz myself but we found that with most formats (.tar.gz included) the ability to extract large archives (>2GB) was very inconsistent between different tools and it caused a lot of confusion among users about what the problem actually was.

Sean · ‎02-13-2015

You beat me to it! I was just downloading the file to confirm the integrity of the file. I downloaded cloudera-quickstart-vm-4.7.0-0-vmware.7z, the SHA-1 checksum matched, and I was able to extract it. If you're not doing so already, I recommend using a download manager for large files (I use DownThemAll! for Firefox) - it will deal with network failures more gracefully than the one built-in to most browsers and you're less likely to have a corrupted or interrupted download.

Sean · ‎01-30-2015

ZooKeeper is now required for some of the features that allow Solr to scale reliably ("SolrCloud"). You need to provide the address of your ZooKeeper ensemble as --zk (host1),(host2):(port) (port is usually 2181)

Online	Offline
Last Visited	‎03-17-2016 10:55 PM

Member Since	‎07-12-2013 07:35 AM
Last Visited	‎03-17-2016 10:55 PM
Posts	435
Kudos received	117

Cloudera Community

Re: Quickstart VM welcome page doesn't recognize t...

Re: Hadoop installation on Ubuntu 14.o4

Re: In Cloudera Quickstart VM how to upgrade lates...

Re: Unable to transfer files from Mac Desktop to C...

Re: Cloudera service and host monitoring fails fre...

Re: cloudera manager 5.4.0 installing kafka parcel...

Re: DB Connection from Impala

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

Re: Issue with Quickstart VM

Re: Issue with Quickstart VM

Re: Need help to create solr collections