About yevgen_shramko

yevgen_shramko · ‎10-25-2018

Secondary NameNode in hadoop is a specially dedicated node in HDFS cluster whose main function is to take checkpoints of the file system metadata present on namenode. It is not a backup namenode. It just checkpoints namenode’s file system namespace. The Secondary NameNode is a helper to the primary NameNode but not replace for primary namenode. As the NameNode is the single point of failure in HDFS Ref: http://hadooptutorial.info/tag/secondary-namenode-functions/

yevgen_shramko · ‎05-15-2018

Unfortunately "--hive-overwrite" option destroy hive table structure and re-create it after that which is not acceptable way. The only way is: 1. hive> truncate table sample; 2. sqoop import --connect jdbc:mysql://yourhost/test --username test --password test01 --table sample --hcatalog-table sample

yevgen_shramko · ‎10-11-2017

@Aditya Sirna That's it. Thank you so much.

yevgen_shramko · ‎10-11-2017

@Jay SenSharma Ok. It looks strange as for me. 1. The one is Correct. $ beeline -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p admin Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive 0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size; +--------------------------------------------------------+--+ | set | +--------------------------------------------------------+--+ | hive.auto.convert.join.noconditionaltask.size=2600000 | +--------------------------------------------------------+--+ 2. But I would like to change here. As I understood this is LLAP server $ beeline -u "jdbc:hive2://ip-172-31-35-100.us-west-2.compute.internal:2181,ip-172-31-34-50.us-west-2.compute.internal:2181,ip-172-31-34-136.us-west-2.compute.internal:2181/bench_mtu_p;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2" -n hive -p admin Connected to: Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive 0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size; +----------------------------------------------------------+--+ | set | +----------------------------------------------------------+--+ | hive.auto.convert.join.noconditionaltask.size=858783744 | +----------------------------------------------------------+--+

yevgen_shramko · ‎10-11-2017

screenshot-from-2017-10-11-11-55-06.png Hi, Parameters configured via Ambari is not applied. Why? Via Amabari It was configured hive.auto.convert.join.noconditionaltask.size=2600000 Connected to: Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive 0: jdbc:hive2://ip-172-31-35-100.us-west-2.co> set hive.auto.convert.join.noconditionaltask.size; +----------------------------------------------------------+--+ | set | +----------------------------------------------------------+--+ | hive.auto.convert.join.noconditionaltask.size=858783744 | +----------------------------------------------------------+--+ <property> <name>hive.auto.convert.join.noconditionaltask</name> <value>true</value> </property> <property> <name>hive.auto.convert.join.noconditionaltask.size</name> <value>858783744</value> </property> Thank you,

yevgen_shramko · ‎10-10-2017

Finally, I found the solution. set hive.auto.convert.join.noconditionaltask = true; set hive.auto.convert.join.noconditionaltask.size = 2000000; By playing with hive.auto.convert.join.noconditionaltask.size got adequate performance. Low value provides performance degradation. Next parameters also might be helpful: set hive.auto.convert.sortmerge.join=true set hive.optimize.bucketmapjoin=true set hive.optimize.bucketmapjoin.sortedmerge=true set hive.auto.convert.sortmerge.join.noconditionaltask=true set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ

yevgen_shramko · ‎10-09-2017

Hi @bkosaraju, Thank you for you answer but I don't believe that this the case. Anyway I tested today a few cases: 1. Renamed column name from `date` to dt. 2. Changed column (partition key) type from date to timestamp 3. Changed column type from date to string. 4. Change ORC partitioned table storage properties to 'orc.compress'='SNAPPY' Nothing has help. Meanwhile on non-partitioned table if I use "`date` date" specification the queries are also failed until I changed column type to timestamp. With timestamp on non-partitioned table it works. Thank you, Yevgen

yevgen_shramko · ‎10-08-2017

q3.tar.gz Hello, I am taking part in PoC project where we are a looking for solution for interactive analytics (Tableau client) 1. Apache Hive (version 2.1.0.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) 2. We have configured 3 node HDP cluster with Hive + LLAP. All our test tables created in ORC format with "orc.compress"="ZLIB" option. 3. Fact table PARTITIONED BY (`date` date) with dynamic partitions. 4. Tables column statistics were collected for all tables. Unfortunately some of our test queries have failed with error: ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1507032990279_0050_1_11, diagnostics=[Task failed, taskId=task_1507032990279_0050_1_11_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1507032990279_0050_1_11_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row Query runs with next parameters specified explicitly: set tez.queue.name=llap; set hive.llap.execution.mode=all; set hive.execution.engine=tez; set mapred.reduce.tasks=-1; set hive.exec.parallel=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode = nonstrict; set hive.exec.max.dynamic.partitions.pernode=256; set hive.exec.max.dynamic.partitions=10000; set hive.optimize.sort.dynamic.partition=true; set hive.enforce.sorting=true; set optimize.sort.dynamic.partitioning=true; set hive.tez.exec.print.summary=true; set hive.optimize.ppd=true; set hive.optimize.ppd.storage=true; set hive.vectorized.execution.enabled=true; set hive.vectorized.execution.reduce.enabled = true; set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true; set hive.tez.auto.reducer.parallelism=true; set hive.tez.max.partition.factor=20; set hive.exec.reducers.bytes.per.reducer=128000000; set hive.optimize.index.filter=true; set hive.exec.orc.skip.corrupt.data=true; set hive.exec.compress.output=true; set tez.am.container.reuse.enabled=TRUE; set hive.compute.query.using.stats=true; set stats.reliable=true; set hive.merge.tezfiles=true; Our findings: 1. Query works well on non-partitioned tables 2. Query works fine with Tez or MR configured but failed with LLAP. 3. If I remove "CAST(DATE_ADD(NEXT_DAY(`f_daily_funnel_report`.`date`,'SU'),-7) AS DATE) AS `twk_calculation_1485062019336982529_ok`" from select list and group by list the query start working. In attachment you will find next files: q3.sql - original queries that failed q3.err - full execution log from beeline client Any ideas ? Thank you,

yevgen_shramko · ‎05-20-2017

Understood. Thank you.

yevgen_shramko · ‎05-20-2017

I am preparing for HDPCA exam and going through the list of exam objectives and have a few questions: 1) When I click on "Add a new node to an existing cluster" it refers to http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.0.0/Ambari_Doc_Suite/ADS_v200.html#ref-d745870f-2b0a-47ad-9307-8c01b440589b. Is this reference correct? I believe It should refer somewhere here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-4303e343-9aee-4e70-b38a-2837ae976e73.1.html 2) It is not clear if "Manually Adding Slave Nodes to an HDP Cluster" is part of HDPCA exam or it will be enough just to be acquainted with adding nodes with Ambari. Thank you,

Online	Offline
Last Visited	‎02-11-2019 10:12 AM

Member Since	‎05-20-2017 12:26 PM
Last Visited	‎02-11-2019 10:12 AM
Posts	12
Kudos received	1

Cloudera Community

Re: Secondary NameNode, CheckpointNode or BackupNo...

Re: In Sqoop import is there an option to overwrit...

Re: configure hive.auto.convert.join.noconditional...

Re: configure hive.auto.convert.join.noconditional...

configure hive.auto.convert.join.noconditionaltask...

Re: Hive query error with Vertex failed on partiti...

Re: Hive query error with Vertex failed on partiti...

Hive query error with Vertex failed on partitioned...

Re: HDPCA exam objectives: "Add a new node to an ...

HDPCA exam objectives: "Add a new node to an exis...