Member since 
    
	
		
		
		05-20-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                12
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		10-25-2018
	
		
		08:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Secondary NameNode in hadoop is a specially dedicated node in HDFS cluster whose main function is to take checkpoints of the file system metadata present on namenode. It is not a backup namenode. It just checkpoints namenode’s file system namespace. The Secondary NameNode is a helper to the primary NameNode but not replace for primary namenode. As the NameNode is the single point of failure in HDFS  Ref: http://hadooptutorial.info/tag/secondary-namenode-functions/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-15-2018
	
		
		03:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Unfortunately "--hive-overwrite" option destroy hive table structure and re-create it after that which is not acceptable way.   The only way is:      1. hive> truncate table sample;     2.  sqoop import --connect jdbc:mysql://yourhost/test  --username test  --password test01 --table sample --hcatalog-table sample  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-11-2017
	
		
		12:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Aditya Sirna That's it. Thank you so much.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-11-2017
	
		
		11:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jay SenSharma  Ok. It looks strange as for me.   1. The one is Correct.  $ beeline  -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p admin  Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+--------------------------------------------------------+--+
|                          set                           |
+--------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=2600000  |
+--------------------------------------------------------+--+   2. But I would like to change here. As I understood this is LLAP server   $ beeline  -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2" -n hive -p admin  Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+----------------------------------------------------------+--+
|                           set                            |
+----------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=858783744  |
+----------------------------------------------------------+--+
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-11-2017
	
		
		10:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 screenshot-from-2017-10-11-11-55-06.png  Hi,   Parameters configured via Ambari is not applied. Why?   Via Amabari It was configured hive.auto.convert.join.noconditionaltask.size=2600000  Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+----------------------------------------------------------+--+
|                           set                            |
+----------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=858783744  |
+----------------------------------------------------------+--+ 
     <property>
      <name>hive.auto.convert.join.noconditionaltask</name>
      <value>true</value>
    </property>
   
    <property>
      <name>hive.auto.convert.join.noconditionaltask.size</name>
      <value>858783744</value>
    </property>
   
  Thank you, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		10-10-2017
	
		
		03:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Finally,  I found the solution.   
set hive.auto.convert.join.noconditionaltask = true; 
set hive.auto.convert.join.noconditionaltask.size = 2000000;   By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation.   Next parameters also might be helpful:  set hive.auto.convert.sortmerge.join=true 
set hive.optimize.bucketmapjoin=true 
set hive.optimize.bucketmapjoin.sortedmerge=true 
set hive.auto.convert.sortmerge.join.noconditionaltask=true 
set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-09-2017
	
		
		02:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @bkosaraju,   Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases:   1. Renamed column name from `date` to dt.   2. Changed column (partition key) type from date to timestamp  3. Changed column type from date to string.   4. Change ORC partitioned table storage properties to  'orc.compress'='SNAPPY'  Nothing has help.     Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works.     Thank you,   Yevgen  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-08-2017
	
		
		08:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 q3.tar.gz  Hello,   I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client)  1. Apache Hive (version 2.1.0.2.6.1.0-129)  Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)  2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with  "orc.compress"="ZLIB" option.   3. Fact table PARTITIONED BY (`date` date)  with dynamic partitions.   4. Tables  column statistics were collected for all tables.   Unfortunately some of our test queries have failed with error:  ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row  Query runs with next parameters specified explicitly:   set tez.queue.name=llap;
set hive.llap.execution.mode=all;
set hive.execution.engine=tez;
set mapred.reduce.tasks=-1;
set hive.exec.parallel=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.exec.max.dynamic.partitions.pernode=256;
set hive.exec.max.dynamic.partitions=10000;
set hive.optimize.sort.dynamic.partition=true;
set hive.enforce.sorting=true;
set optimize.sort.dynamic.partitioning=true;
set hive.tez.exec.print.summary=true;
set hive.optimize.ppd=true;
set hive.optimize.ppd.storage=true;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.tez.auto.reducer.parallelism=true;
set hive.tez.max.partition.factor=20;
set hive.exec.reducers.bytes.per.reducer=128000000;
set hive.optimize.index.filter=true;
set hive.exec.orc.skip.corrupt.data=true;
set hive.exec.compress.output=true;
set tez.am.container.reuse.enabled=TRUE;
set hive.compute.query.using.stats=true;
set stats.reliable=true;
set hive.merge.tezfiles=true;
  Our findings:   1. Query works well on non-partitioned tables   2. Query works fine with Tez or MR configured but failed with LLAP.   3.  If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working.   In attachment you will find next files:   q3.sql  - original queries that failed   q3.err  - full execution log from beeline client  Any ideas ?   Thank you,  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		05-20-2017
	
		
		09:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Understood.  Thank you.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-20-2017
	
		
		12:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am preparing for HDPCA exam and going through the list of exam objectives and have a few questions:   1) When I click on "Add a new node to an existing cluster"  it refers to http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-d745870f-2b0a-47ad-9307-8c01b440589b.  Is this reference correct? I believe It should refer somewhere here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-4303e343-9aee-4e70-b38a-2837ae976e73.1.html  2) It is not clear if  "Manually Adding Slave Nodes to an HDP Cluster" is part of HDPCA exam or it will be enough just to be acquainted with adding nodes with Ambari.    Thank you, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Hortonworks Data Platform (HDP)