Member since 
    
	
		
		
		07-12-2013
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                435
            
            
                Posts
            
        
                117
            
            
                Kudos Received
            
        
                82
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2340 | 11-02-2016 11:02 AM | |
| 3630 | 10-05-2016 01:58 PM | |
| 8291 | 09-07-2016 08:32 AM | |
| 8909 | 09-07-2016 08:27 AM | |
| 2521 | 08-23-2016 08:35 AM | 
			
    
	
		
		
		04-28-2015
	
		
		07:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 The reason is that CDH is installed in the VM using Linux packages, not parcels (so that using Cloudera Manager to manage the services is optional). If you'd like to install the Kafka parcel, you'll first need to move CDH to a parcel-based install. The documentation to do this can be found here: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_migrating_packages_to_parcels.html. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-24-2015
	
		
		03:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Can you confirm in Cloudera Manager that the HDFS service is running and healthy? If the service is marked in any color other than green, there should be a little warning icon that you can click on to get any information about what may be wrong.     If the service is healthy, can you tell me what happens when you run "hadoop fs -ls /user/examples/sqoop_import_order_items.avsc" from the command line on a machine in your cluster? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-17-2015
	
		
		11:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I'm afraid I'm not very familiar with R and running it against Hadoop. My first thought is that perhaps the program that creates the files and the program that looks for the files are running as different users? /user/cloudera is the default working directory for the cloudera user, but other users will default to other directories. e.g. if 'root' asks for a file called '0', unless there's an absolute path with it, it means /user/root/0. Is it possible these files exist under a different user's home directory? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2015
	
		
		12:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I believe this procedure should get you switched over from YARN / MR2 to MR1. After running it I was able to comput pi using MR1:  for service in mapreduce-historyserver yarn-nodemanager yarn-proxyserver yarn-resourcemanager; do
    sudo service hadoop-${service} stop
    sudo chkconfig hadoop-${service} off
done
sudo yum remove -y hadoop-conf-pseudo
sudo yum install -y hadoop-0.20-conf-pseudo
for service in 0.20-mapreduce-jobtracker 0.20-mapreduce-tasktracker; do
    sudo service hadoop-${service} start
    sudo chkconfig hadoop-${service} on
done    It stops and disables the MR2 / YARN services, swaps the configuration files, then starts and enables the MR1 services. Again, the tutorial is not written to be used (or tested) with with MR1, so it's possible you'll run into some other issues. I can't think if any specific incompatibilities - just recommending that if you want to walk through the tutorial, you do it with an environment as close to the original VM as possible - otherwise who knows what differences may be involved. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2015
	
		
		11:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 To answer Morgan's question, port 8020 is the HDFS NameNode, port 8021 is the JobTracker in MR1, which is where you would have submitted jobs in CDH 4. It can still be used in CDH 5, but as it is not the default, you'll need to switch around some configuration and services (and understand that the rest of the tutorial may not work exactly as expected because of the switch - I'd suggest perhaps starting with a fresh copy of the tutorial to be sure everything in the tutorial will work and not conflict with what you've been doing in R). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2015
	
		
		11:13 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 After reviewing the blog post, I noticed that it is written for the CDH 4.1.1 VM. I'm afraid there have been a number of changes since then that might be complicating things. The primary change, and the one that I think is complicating Sqoop for you, is the in CDH 4 we recommend MR1 for production, whereas in CDH 5 YARN has stabilized and we now recommend MR2 for production because of the superior resource management.     I believe the following line is responsible for setting up your environment such that Sqoop is trying to use MR1 when it is not running:  ln -s /etc/default/hadoop-0.20-mapreduce /etc/profile.d/hadoop.sh  You could either try getting rid of that symlink and anything else that's telling the system to use MR1, or you could stop YARN / MR2 and use MR1 instead. I'll try post some instructions for doing the latter shortly...    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2015
	
		
		09:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 >> In the very first tutorial on cloudera, it reads "You should first log in to the Master Node of your cluster using SSH - you can get the credentials using the instructions on Your Cloudera Cluster. "     It's a little confusing whether you're running these commands on your host machine, or on the VM. If you're reading the tutorial hosted on a website somewhere, it's written with you running this on a fully-distributed cluster in mind and SSH'ing in to the machine. There's a modified copy hosted on the VM itself (just go to localhost in the web browser in the VM, or on your host as port-forwarding should work for VirtualBox) that (in my copy at least) just tells you to click on the terminal icon on the VM's desktop and enter commands there. Which version of the VM are you using and where do you see that text? It should be possible to SSH into the VM, and even run these commands from your host machine but doing so requires a lot of network configuration to be set up correctly - it won't be set up that way by default and it can be complicated to get it working consistently on different hosts - which is why I recommend just using the terminal on the VM's desktop.     The root cause of your connection refused error problem appears to be that Sqoop is trying to use MR1. The VM is set up to use MR2 / YARN by default, so that is probably why MR1 is not running and you can't connect. Cloudera supports running both MR1 and MR2, but you can't have a machine configured as a client to both at the same time. When I run this on my copy of the VM (and in all recent versions) Sqoop is definitely using MR2 / YARN. Have you change any other configurations before running Sqoop? Is it possible you've got Sqoop installed on your host machine and it's configured differently than Sqoop in the VM? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2015
	
		
		09:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I hear you - I prefer .tar.gz myself but we found that with most formats  (.tar.gz included) the ability to extract large archives (>2GB) was very  inconsistent between different tools and it caused a lot of confusion among  users about what the problem actually was.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2015
	
		
		09:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You beat me to it! I was just downloading the file to confirm the integrity of the file. I downloaded cloudera-quickstart-vm-4.7.0-0-vmware.7z, the SHA-1 checksum matched, and I was able to extract it. If you're not doing so already, I recommend using a download manager for large files (I use DownThemAll! for Firefox) - it will deal with network failures more gracefully than the one built-in to most browsers and you're less likely to have a corrupted or interrupted download. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-30-2015
	
		
		06:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 ZooKeeper is now required for some of the features that allow Solr to scale reliably ("SolrCloud"). You need to provide the address of your ZooKeeper ensemble as --zk (host1),(host2):(port) (port is usually 2181) 
						
					
					... View more