Member since
06-23-2016
13
Posts
2
Kudos Received
0
Solutions
09-07-2017
08:32 PM
@Eugene Koifman That helped reduce the spilled_rows from 11 billion to 5 billion. I was under the impression that inserting data into a partition is faster with a distribute by. This was useful. Also, I heard compressing the intermediary files helps reduce the spilled_rows. Is that correct? set
mapreduce.map.output.compress = true set
mapreduce.output.fileoutputformat.compress = true Or anything else we can do to optimize the query?
... View more
09-06-2017
09:51 PM
Currently, we are running a Hive job which inserts around 2 billion rows into an acid table which is partitioned and clustered. I see a huge number of 'SPILLED_RECORDS' and I'm not exactly sure how to fix/improve. I think more the spilled_records higher the io and processing times. Any inputs are appreciated. Some TEZ stats of the job inline: org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS 1653127325 org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS 11490401485 org.apache.tez.common.counters.TaskCounter PHYSICAL_MEMORY_BYTES 6732617089024 org.apache.tez.common.counters.TaskCounter VIRTUAL_MEMORY_BYTES 13973924044800 org.apache.tez.common.counters.TaskCounter COMMITTED_HEAP_BYTES 6732617089024 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILLS_BYTES_WRITTEN 572880403808 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILLS_BYTES_READ 1540736899809 org.apache.tez.common.counters.TaskCounter ADDITIONAL_SPILL_COUNT 6965 HIVE RECORDS_IN_Map_1 1941777885 TaskCounter_Map_1_OUTPUT_Reducer_2 SPILLED_RECORDS 3739831692 TaskCounter_Reducer_2_INPUT_Map_1 SPILLED_RECORDS 1941777885 TaskCounter_Reducer_2_OUTPUT_Reducer_3 SPILLED_RECORDS 3867014023 TaskCounter_Reducer_2_OUTPUT_Reducer_3 ADDITIONAL_SPILLS_BYTES_READ 387364743478 TaskCounter_Reducer_2_OUTPUT_Reducer_3 ADDITIONAL_SPILLS_BYTES_WRITTEN 320256756650 TaskCounter_Reducer_3_INPUT_Reducer_2 ADDITIONAL_SPILLS_BYTES_WRITTEN 11229906959 TaskCounter_reducer_3_INPUT_Reducer_2 SPILLED_RECORDS 1941777885
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
08-29-2017
07:01 PM
@Dinesh Chitlangia I am aware of distcp. As described in the description, I have to parse each row and do reverse-geocoding using solr. Then each record is enriched with geo location information. I want to write the updated flow files into a different cluster rather than writing the flowfiles into HDF cluster and then Distcp to another cluster. I'm trying to avoid this unnecessary over-head.
... View more
08-29-2017
05:03 PM
I want to read a csv file which has lat long data through Nifi and each record will hit a solr-cloud instance for reverse-geocoding. Post which, the information needs to be loaded into a different HDP cluster for different Hive processing. I understand I can directly load the data into Hive tables using Nifi which is also an option. However, i'm not sure how to directly load data into HDFS on a different cluster. Can any of you point me in the right direction? Any documents or blogs I can refer?
... View more
Labels:
- Labels:
-
Apache NiFi
08-18-2017
02:29 PM
@Sridhar Reddy can you please share the link?
... View more
08-17-2017
11:36 PM
Currently, I have a HDP 2.6.1 cluster with Ranger (LDAP). I disabled Hive plugin, deleted the Hive service from Ranger UI and also reset the Ranger admin password by following the instructions inline: 'https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/updating_ranger_admin_passwords.html'. I updated Ranger UI admin password first, then updated same password for the following 2 values: Ranger Admin user -- The credentials for this user are set in Configs > Advanced ranger-env in the fields labeled admin_username (default value: admin ) and admin_password (default value: admin ). Admin user used by Ambari to create repo/policies -- The user name for this user is set in Configs > Admin Settings in the field labeled Ranger Admin username for Ambari (default value: amb_ranger_admin ). The password for this user is set in the field labeled Ranger Admin user's password for Ambari. This password is specified during the Ranger installation. Now, when I enable Hive plugin and restart hive services, I get the following error: 2017-08-17 18:51:18,877 - Error getting HDP_Cluster2_hive repository for component hive. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"} 2017-08-17 18:51:53,664 - Error creating repository. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"} 2017-08-17 18:52:58,553 - Error getting HDP_Cluster2_hive repository for component hive. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"} 2017-08-17 18:53:33,505 - Error creating repository. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"} 2017-08-17 18:53:33,505 - Hive Repository creation failed in Ranger admin ___________________________LOG_File__________________________________________________ 2017-08-17 18:44:05,039 - call returned (0, 'hive-server2 - 2.6.1.0-129')
2017-08-17 18:44:05,039 - RangeradminV2: Skip ranger admin if it's down ! 2017-08-17 18:44:05,621 - amb_ranger_admin user already exists. 2017-08-17 18:44:06,160 - Will retry 4 time(s), caught exception: Error getting HDP_Cluster2_hive repository for component hive. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"}. Sleeping for 8 sec(s) 2017-08-17 18:44:14,741 - Will retry 3 time(s), caught exception: Error getting HDP_Cluster2_hive repository for component hive. Http status code - 401.
{"statusCode":401,"msgDesc":"Authentication Failed"}. Sleeping for 8 sec(s) ____________________________________________________________________________ If I understand right I need to updated the ranger plugin properties in Hive. But i'm not sure which one. Please let me know how do I go about fixing the issue. Thanks
... View more
Labels:
- Labels:
-
Apache Ranger
07-10-2017
04:14 PM
@Daniel Kozlowski We have Ranger (LDAP) for authorization and also LDAP/AD for Hive authentication. So the "-n admin" option doesn't work
... View more
07-08-2017
05:03 AM
Yes, I tried both the approach with double quotes and without quotes.
... View more
07-07-2017
10:42 PM
When I directly try to connect to hive using Beeline -u "jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2" -n username CLI throws the following error "Connecting to jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/
17/07/07 18:11:35 [main]: WARN jdbc.HiveConnection: Failed to connect to node03.comp.net:2181
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node03.comp.net:2181/: null (state=08S01,code=0)
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive" While I just run beeline <enter> and use same jdbc url I'm able to connect. !connect jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Connecting to jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Enter username for jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2: username
Enter password for jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2: *********
Connected to: Apache Hive (version 2.1.0.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ Any idea why this is happening. How to fix it? A bit about our env: We have Ranger (LDAP) for authorization and also LDAP/AD for Hive authentication. I plan to use the above commands to launch HQL scripts with parameters like beeline -u jdbc:hive2://node03.comp.net:2181,node02.comp.net:2181,node01.comp.net:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2 -n xxxx -w pfile -f /home/hdfs/hive_llap/script/test.hql --hivevar db=$dbname Any help is appreciated, thanks.
... View more
Labels:
- Labels:
-
Apache Hive