Member since 
    
	
		
		
		05-16-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                785
            
            
                Posts
            
        
                114
            
            
                Kudos Received
            
        
                39
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2328 | 06-12-2019 09:27 AM | |
| 3578 | 05-27-2019 08:29 AM | |
| 5724 | 05-27-2018 08:49 AM | |
| 5243 | 05-05-2018 10:47 PM | |
| 3113 | 05-05-2018 07:32 AM | 
			
    
	
		
		
		03-13-2017
	
		
		11:08 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 1) Since snappy is not too good at compression (disk), what would be the difference on disk space for a 1 TB table when stored as parquet only and parquet with snappy compression.     I created three table with different senario . please take a peek into it . It will give you some idea.      TABLE 1 - No compression parquet format      +-------+--------+--------+---------+
| #Rows | #Files | Size   | Format  |
+-------+--------+--------+---------+
| -1    | 4      | 3.73MB | PARQUET |
+-------+--------+--------+---------+   TABLE 2 : TEXT FORMAT with default compression Snappy      +-------+--------+---------+--------+
| #Rows | #Files | Size    | Format |
+-------+--------+---------+--------+
| 0     | 8      | 22.04MB | TEXT   |
+-------+--------+---------+--------+  TABLE 3 - With parquet  + compression enabled  as Snappy         +-------+--------+--------+---------+
| #Rows | #Files | Size   | Format  |
+-------+--------+--------+---------+
| -1    | 4      | 3.71MB | PARQUET |
+-------+--------+--------+---------+     2) Is it possible to compress a non-compressed parquet table later with snappy?      Alter table is a  logical operation that updates the table metadata in the metastore database.       however you can  fire a  CTAS perform the compression and rename if you want using     alter table d1.X rename to Y;    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		10:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 there is typo in the configuration .   agent.sinks.agent-sink.channels = agent-chan      to      agent.sinks.agent-sink.channel = agent-chan     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		10:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There are few things that needs to be take care when dealing with flume configuration.      when u define source .   agent.sources = sr1   when u define sink  agent.sinks = sink1 sink2 ...  when u define channels   agent.channels = ch1 ch1   in your configuration there is a typo .   agent.sinks.agent-sink.channels = agent-chan  change it to   agent.sinks.agent-sink.channel = agent-chan      You can configure an agent with zero or more sinks , but each sink can read events exactly from one channel .  also you have to configure one channel for sink , if not it will be removed.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-08-2017
	
		
		06:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  Indeed .     To sum up , the below  stated are the default compression codec  -      Hive - default Compression is DeflateCodec 
Impala - default Compression is Snappy  Thanks mate 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-08-2017
	
		
		12:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I think snappy by default .  refer this link -  https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_parquet.html  Could you please correct me if I am wrong .    Thanks  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-08-2017
	
		
		10:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 1) If we create a table (both hive and impala)and just specify stored as parquet . Will that be snappy compressed by default in CDH?     Currently the default compression is - Snappy with Impala tables.      2) If not how do i identify a parquet table with snappy compression and parquet table without snappy compression?.     describe formated tableName  Note  -  but you will always see the compression as NO because the compression data format is not stored in metadata of the table , the best way is to do dfs -ls -r  to the table location and see the file format for compression.      3) Also how to specify snappy compression for table level  whiel creating and also at global level, even if nobody specified at table level (all table stored as parquet should be snappy compressed).     CREATE TABLE  external_parquet (c1 INT, c2 STRING)
  STORED AS PARQUET LOCATION ' '    or    Session basis   SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;  Globally  - i,e file is executed when you launch the hive shell  Put the above in  location in CDH  /etc/hive/conf.cloudera.hive1 if dont find one you can always create   .hiverc file        Please refer this link for more Create  Table properties   https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_create_table.html    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-07-2017
	
		
		08:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You may not have appropriate Jar in your class path thats the reaon it is throwing java.lang.NoClassDefFoundError  i belive you are missing the in the httpclient-4.2.jar in your Java application classpath. When you extra the jar you could see the below class.  org.apache.http.client.utils.URIUtils.class    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-02-2017
	
		
		08:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I belive the problem might be in this configuration file .   did you change the localhost into your "hostname"  - in Server_host in the below configuration.      /etc/cloudera-scm-agent/config.ini    server_host=localhost
                    change it to     
server_host=  - to the host were you installed CM        then      sudo service cloudera-scm-server-db start
$ sudo service cloudera-scm-server start  this should help you to connect to CM via browser  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-01-2017
	
		
		09:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 in mapred-site.xml      mapreduce.map.memory.mb = 
 
mapreduce.task.io.sort.mb = 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-24-2017
	
		
		05:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Use the event desearlizer   You can use BlobDeserializer  - if  you want to parse the whole file inside one event.  or You can use Line  -  one event per line of text input.      Refer the link   https://flume.apache.org/FlumeUserGuide.html#event-deserializers 
						
					
					... View more