Member since
05-20-2017
12
Posts
1
Kudos Received
0
Solutions
10-25-2018
08:16 AM
Secondary NameNode in hadoop is a specially dedicated node in HDFS cluster whose main function is to take checkpoints of the file system metadata present on namenode. It is not a backup namenode. It just checkpoints namenode’s file system namespace. The Secondary NameNode is a helper to the primary NameNode but not replace for primary namenode. As the NameNode is the single point of failure in HDFS Ref: http://hadooptutorial.info/tag/secondary-namenode-functions/
... View more
05-15-2018
03:59 PM
1 Kudo
Unfortunately "--hive-overwrite" option destroy hive table structure and re-create it after that which is not acceptable way. The only way is: 1. hive> truncate table sample; 2. sqoop import --connect jdbc:mysql://yourhost/test --username test --password test01 --table sample --hcatalog-table sample
... View more
10-11-2017
12:38 PM
@Aditya Sirna That's it. Thank you so much.
... View more
10-11-2017
11:34 AM
@Jay SenSharma Ok. It looks strange as for me. 1. The one is Correct. $ beeline -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p admin Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+--------------------------------------------------------+--+
| set |
+--------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=2600000 |
+--------------------------------------------------------+--+ 2. But I would like to change here. As I understood this is LLAP server $ beeline -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2" -n hive -p admin Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+----------------------------------------------------------+--+
| set |
+----------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=858783744 |
+----------------------------------------------------------+--+
... View more
10-11-2017
10:34 AM
screenshot-from-2017-10-11-11-55-06.png Hi, Parameters configured via Ambari is not applied. Why? Via Amabari It was configured hive.auto.convert.join.noconditionaltask.size=2600000 Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size;
+----------------------------------------------------------+--+
| set |
+----------------------------------------------------------+--+
| hive.auto.convert.join.noconditionaltask.size=858783744 |
+----------------------------------------------------------+--+
<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>858783744</value>
</property>
Thank you,
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
10-10-2017
03:34 PM
Finally, I found the solution.
set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 2000000; By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation. Next parameters also might be helpful: set hive.auto.convert.sortmerge.join=true
set hive.optimize.bucketmapjoin=true
set hive.optimize.bucketmapjoin.sortedmerge=true
set hive.auto.convert.sortmerge.join.noconditionaltask=true
set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ
... View more
10-09-2017
02:36 PM
Hi @bkosaraju, Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases: 1. Renamed column name from `date` to dt. 2. Changed column (partition key) type from date to timestamp 3. Changed column type from date to string. 4. Change ORC partitioned table storage properties to 'orc.compress'='SNAPPY' Nothing has help. Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works. Thank you, Yevgen
... View more
10-08-2017
08:16 PM
q3.tar.gz Hello, I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client) 1. Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) 2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with "orc.compress"="ZLIB" option. 3. Fact table PARTITIONED BY (`date` date) with dynamic partitions. 4. Tables column statistics were collected for all tables. Unfortunately some of our test queries have failed with error: ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row Query runs with next parameters specified explicitly: set tez.queue.name=llap;
set hive.llap.execution.mode=all;
set hive.execution.engine=tez;
set mapred.reduce.tasks=-1;
set hive.exec.parallel=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.exec.max.dynamic.partitions.pernode=256;
set hive.exec.max.dynamic.partitions=10000;
set hive.optimize.sort.dynamic.partition=true;
set hive.enforce.sorting=true;
set optimize.sort.dynamic.partitioning=true;
set hive.tez.exec.print.summary=true;
set hive.optimize.ppd=true;
set hive.optimize.ppd.storage=true;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.tez.auto.reducer.parallelism=true;
set hive.tez.max.partition.factor=20;
set hive.exec.reducers.bytes.per.reducer=128000000;
set hive.optimize.index.filter=true;
set hive.exec.orc.skip.corrupt.data=true;
set hive.exec.compress.output=true;
set tez.am.container.reuse.enabled=TRUE;
set hive.compute.query.using.stats=true;
set stats.reliable=true;
set hive.merge.tezfiles=true;
Our findings: 1. Query works well on non-partitioned tables 2. Query works fine with Tez or MR configured but failed with LLAP. 3. If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working. In attachment you will find next files: q3.sql - original queries that failed q3.err - full execution log from beeline client Any ideas ? Thank you,
... View more
Labels:
- Labels:
-
Apache Hive
05-20-2017
09:04 PM
Understood. Thank you.
... View more
05-20-2017
12:51 PM
I am preparing for HDPCA exam and going through the list of exam objectives and have a few questions: 1) When I click on "Add a new node to an existing cluster" it refers to http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-d745870f-2b0a-47ad-9307-8c01b440589b. Is this reference correct? I believe It should refer somewhere here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-4303e343-9aee-4e70-b38a-2837ae976e73.1.html 2) It is not clear if "Manually Adding Slave Nodes to an HDP Cluster" is part of HDPCA exam or it will be enough just to be acquainted with adding nodes with Ambari. Thank you,
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)