Member since 
    
	
		
		
		07-31-2013
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1924
            
            
                Posts
            
        
                462
            
            
                Kudos Received
            
        
                311
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1966 | 07-09-2019 12:53 AM | |
| 11824 | 06-23-2019 08:37 PM | |
| 9111 | 06-18-2019 11:28 PM | |
| 10069 | 05-23-2019 08:46 PM | |
| 4510 | 05-20-2019 01:14 AM | 
			
    
	
		
		
		05-23-2019
	
		
		08:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							For HBase MOBs, this can serve as a good starting point as most of the changes are administrative and the writer API remains the same as regular cells: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hbase_mob.html    For SequenceFiles, a good short snippet can be found here: https://github.com/sakserv/sequencefile-examples/blob/master/test/main/java/com/github/sakserv/sequencefile/SequenceFileTest.java#L65-L70 and for Parquet: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/example/ExampleParquetWriter.java    More general reading for the file formats: https://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ and https://parquet.apache.org/documentation/latest/
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-20-2019
	
		
		01:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							You can apply the queries directly on that external table. Hive will use  HDFS for any transient storage it requires as part of the query stages.    Of course, if it is a set of queries overall, you can also store all the  intermediate temporary tables on HDFS in the way you describe, but the  point am trying to make is that you do not need to copy the original data  as-is, just allow Hive to read off of S3/write into S3 at the points that  matter.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2019
	
		
		08:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							- Do you observe this intermittency from only specific client/gateway hosts?  - Does your cluster apply firewall rules between the cluster hosts?    One probable reason behind the intermittent 'Connection refused' from KMS could be that it is frequently (auto)restarting. Checkout its process stdout messages and service logs to confirm if there's a kill causing it to be restarted by the CM Agent supervisor.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2019
	
		
		06:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							You can do this via two methods: Container files, or HBase MOBs. Which is  the right path depends on your eventual, dominant read pattern for this  data.    If your analysis will require loading up only a small range of images out  of the total dataset, or individual images, then HBase is a better fit with  its key based access model, columnar storage and caches.    If instead you will require processing these images in bulk, then large  container files (such as Sequence Files (with BytesWritable or equivalent),  Parquet Files (with BINARY/BYTE_ARRAY types), etc. that can store multiple  images into a single file, and allow for fast, sequential reads of all  images in bulk.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2019
	
		
		06:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Would you be able to attach the contents of  /tmp/scm_prepare_node.vQZe0yDf/scm_prepare_node.log (or any/all  '/tmp/**/scm_prepare_node.log' files) from the host the install failed on  (node5 in this case)?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2019
	
		
		06:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							You do not need to pull files into HDFS as a step in your processing, as  CDH provides inbuilt connectors to pull input/write output directly from S3  storage (s3a:// URIs, backed by some configurations that provide  credentials and targets).    This page is a good starting reference to setting up S3 access over Cloud  installations:  https://www.cloudera.com/documentation/director/latest/topics/director_s3_object_storage.html  -  make sure to checkout the page links from the opening paragraph too.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-15-2019
	
		
		06:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							The Disk Balancer sub-system is local to each DataNode and can be triggered  on distinct hosts in parallel.    The only time you should receive that exception is if the targeted DN's  hdfs-site.xml does not carry the property that enables disk balancer, or  when the DataNode is mid-shutdown/restart.    How have you configured disk balancer for your cluster? Did you follow the  configuration approach presented at  https://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/?  What is your CDH and CM version?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-14-2019
	
		
		07:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Look for an exception in logs preceding the "Failed open of region="  handling failure message on your RegionServer. One situation may be that a  HFile is un-openable under the region (for varied reasons), and will  require being sidelined (removed away) for bringing the region back online.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-12-2019
	
		
		07:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Thank you for sharing that output. The jar does appear to carry one class set in the right package directory, but also carries another set under a different directory.    Perhaps there is a versioning/work-in-progress issue here, where the incorrect build is the one that ends up running.    Can you try to build your jar again from a clean working directory?    If the right driver class runs, you should not be seeing the following observed log:    > 19/05/11 02:43:49 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-09-2019
	
		
		09:17 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Can you describe the steps used for building the jar from your compiled program?    Use the 'jar tf' command to check if all 3 of your class files are within it, and not just the WordCountDriver.class file.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













