Member since
06-23-2016
13
Posts
2
Kudos Received
0
Solutions
09-07-2017
08:32 PM
@Eugene Koifman That helped reduce the spilled_rows from 11 billion to 5 billion. I was under the impression that inserting data into a partition is faster with a distribute by. This was useful. Also, I heard compressing the intermediary files helps reduce the spilled_rows. Is that correct? set
mapreduce.map.output.compress = true set
mapreduce.output.fileoutputformat.compress = true Or anything else we can do to optimize the query?
... View more
09-06-2017
09:51 PM
Currently, we are running a Hive job which inserts around 2 billion rows into an acid table which is partitioned and clustered. I see a huge number of 'SPILLED_RECORDS' and I'm not exactly sure how to fix/improve. I think more the spilled_records higher the io and processing times. Any inputs are appreciated. Some TEZ stats of the job inline: org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS 1653127325 org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS 11490401485 org.apache.tez.common.counters.TaskCounter PHYSICAL_MEMORY_BYTES 6732617089024 org.apache.tez.common.counters.TaskCounter VIRTUAL_MEMORY_BYTES 13973924044800 org.apache.tez.common.counters.TaskCounter COMMITTED_HEAP_BYTES 6732617089024 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILLS_BYTES_WRITTEN 572880403808 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILLS_BYTES_READ 1540736899809 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILL_COUNT 6965 HIVE RECORDS_IN_Map_1 1941777885 TaskCounter_Map_1_OUTPUT_Reducer_2 SPILLED_RECORDS 3739831692 TaskCounter_Reducer_2_INPUT_Map_1 SPILLED_RECORDS 1941777885 TaskCounter_Reducer_2_OUTPUT_Reducer_3 SPILLED_RECORDS 3867014023 TaskCounter_Reducer_2_OUTPUT_Reducer_3 ADDITIONAL_SPILLS_BYTES_READ 387364743478 TaskCounter_Reducer_2_OUTPUT_Reducer_3 ADDITIONAL_SPILLS_BYTES_WRITTEN 320256756650 TaskCounter_Reducer_3_INPUT_Reducer_2 ADDITIONAL_SPILLS_BYTES_WRITTEN 11229906959 TaskCounter_reducer_3_INPUT_Reducer_2 SPILLED_RECORDS 1941777885
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
07-10-2017
04:14 PM
@Daniel Kozlowski We have Ranger (LDAP) for authorization and also LDAP/AD for Hive authentication. So the "-n admin" option doesn't work
... View more
07-08-2017
05:03 AM
Yes, I tried both the approach with double quotes and without quotes.
... View more
07-07-2017
10:42 PM
When I directly try to connect to hive using Beeline -u "jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2" -n username CLI throws the following error "Connecting to jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/
17/07/07 18:11:35 [main]: WARN jdbc.HiveConnection: Failed to connect to node03.comp.net:2181
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node03.comp.net:2181/: null (state=08S01,code=0)
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive" While I just run beeline <enter> and use same jdbc url I'm able to connect. !connect jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Connecting to jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Enter username for jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2: username
Enter password for jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2: *********
Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ Any idea why this is happening. How to fix it? A bit about our env: We have Ranger (LDAP) for authorization and also LDAP/AD for Hive authentication. I plan to use the above commands to launch HQL scripts with parameters like beeline -u jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2 -n xxxx -w pfile -f /home/hdfs/hive_llap/script/test.hql --hivevar db=$dbname Any help is appreciated, thanks.
... View more
Labels:
- Labels:
-
Apache Hive