Member since 
    
	
		
		
		07-12-2013
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                435
            
            
                Posts
            
        
                117
            
            
                Kudos Received
            
        
                82
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2330 | 11-02-2016 11:02 AM | |
| 3618 | 10-05-2016 01:58 PM | |
| 8274 | 09-07-2016 08:32 AM | |
| 8884 | 09-07-2016 08:27 AM | |
| 2520 | 08-23-2016 08:35 AM | 
			
    
	
		
		
		06-02-2016
	
		
		12:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Also, note that there's a script that tries to detect a public IP and set  up the hosts file for you on boot. If you're going to edit it manually, you  probably want to comment out the line in  /etc/init.d/cloudera-quickstart-init that calls  /usr/bin/cloudera-quickstart-ip. I don't remember which version that was  added in. It might have been 5.5 - so if your VM doesn't have  /usr/bin/cloudera-quickstart-ip you can ignore this post and safely edit  the hosts file anyway.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		09:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							intermediate_access_logs was created as part of the ETL process in the  tutorial. That process is done via Hive because it uses Hive SerDe's and  other Hive-only features. The final table created in that process  (tokenized_access_logs, if I remember correctly) is the one you should be  able to query in Impala. Also, don't forget to 'invalidate metadata' when  the ETL process is finished, since Impala doesn't cache metadata.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-01-2016
	
		
		09:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I don't know much about Spark internals to give much intelligent advice  here, but it's possible it's a matter of resources. You still have the  problem in your hosts file that I described above. The hosts file you  posted maps 127.0.0.1 AND your public IP to quickstart.cloudera. You should  remove quickstart and quickstart.cloudera from the 127.0.0.1 line and have  only your public IP map to that (as shown below). You'll need to restart  all services after you make this change.    127.0.0.1 localhost localhost.localdomain  quickstart.cloudera quickstart  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-20-2016
	
		
		01:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							The VirtualBox Guest additions are installed in the VM which should enable  drag & drop of files, but perhaps it's having issues with the size of the  files? SSH should also be running so scp is another option, as is a Shared  Folder. You'll need to get the file to be visible from the VM's filesystem,  perhaps unzip them at that point, and then you can use 'hadoop fs  -copyFromLocal' to put them in HDFS.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-02-2016
	
		
		02:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							When you try to stop a service, it will warn you which services depend on  it if they are running. If you try to start a service, it will warn you  which services it depends on if they are not running.    I believe Zookeeper, HDFS, and YARN are the only other services you need to  run for Spark, HBase, and Hive.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-29-2016
	
		
		07:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I don't have a ton of experience with Llama, but I think the  misunderstanding here is that Impala manages the execution of its own  queries, and the MapReduce framework manages the execution of Hive queries.  YARN manages resources for individual MapReduce jobs, and it can manage the  Impala daemons via Llama. The YARN application for Llama will run as long  as Impala does - that's by design to keep the latency of Impala queries  very low. In the case of Hive, YARN will manage the job's resources only  until that job (a single query) is finished.    Not sure why your Hive queries would not be running. If this is in the  QuickStart VM, my first guess would be that if Llama is still running and  there aren't enough executors / slots for your Hive queries. YARN in the  QuickStart VM is not going to be configured with a lot of capacity and it's  not tested with Llama.    I know of no other way to manage Impala resources via YARN, though.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2016
	
		
		07:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							If you're in the QuickStart VM, it sounds like the browser you're talking  about it is looking at the native Linux filesystem. You can find the file  in this filesystem at /opt/examples/log_files/access.log.2 (or something  like that). The Hive Warehouse directory is in HDFS, which is a separate  filesystem.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2016
	
		
		07:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							The 2 tables that are created are called 'intermediate_access_logs' and  'tokenized_access_logs' when shown in Hive or Impala. The  intermediate_access_logs table is backed by the raw 'original_access_logs'  file which is copied into HDFS. If you want to view it as a table, it  should still be queryable in Hive at the end of the tutorial. The  underlying data should still be in  /user/hive/warehouse/original_access_logs in HDFS or  /opt/examples/log_files/ on your local filesystem.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2016
	
		
		07:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							Looks like the YARN Resource Manager process is not running. I would  restart it with:    'sudo service hadoop-yarn-resourcemanager restart'.    If you continue to have issues, other services may have failed to come up  as a result of this or as a result of the same root cause. The easiest way  to restart everything in order on the VM is to simply reboot. If you have  sufficient memory for the VM, running on of the Cloudera Manager options on  the desktop makes it a lot easier to see the health of all the services,  etc.    You might also want to look at the log files in /var/log/hadoop-yarn to see  what kinds of exceptions are being thrown as the service dies.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2016
	
		
		07:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I apologize for the confusion - the service got a bit backed up over the  weekend because of too many people abandoning clusters mid-deployment  improperly. I've cleared out everything that looks abandoned so it should  work better now. Note that access codes can't be reused, however, so if you  deleted your previous stack you'll need to register for a new access code  to try again.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













