About bleonhardi

bleonhardi · ‎01-25-2016

wrote as an answer because of the character limit: yes first go into ambari or perhaps better the OS and search for the tez.lib.uris property in the properties file less /etc/tez/conf/tez-site.xml You should find something like this: <value>/hdp/apps/${hdp.version}/tez/tez.tar.gz</value> if this is not available you may have a different problem. ( Tez client not installed some configuration issue) You can then check if these files exist in HDFS with hadoop fs -ls /hdp/apps/ find the version number for example 2.3.2.0-2950 [root@sandbox ~]# hadoop fs -ls /hdp/apps/2.3.2.0-2950/tez Found 1 items -r--r--r-- 3 hdfs hadoop 56926645 2015-10-27 14:40 /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz You can check if this file is corrupted somehow with hadoop fs -get /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz You can then try to untar it to see if that works. If the file doesn't exist in HDFS you can find it in the installation directory of HDP (/usr/hdp/2.3.2.0-2950/tez/lib/tez.tar.gz on the local filesystem ) You could then put it into hdfs

bleonhardi · ‎01-25-2016

There are different possibilities. Normally this means the tez libraries are not present in HDFS. Are you using the sandbox? You should check if the tez client is installed on your pig client, if the tez-site.xml contains the tez.lib.uris property and if the tez libraries are actually in HDFS and valid ( download them and untar to check ) /hdp/apps/<hdp_version>/tez/tez.tar.gz https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html

bleonhardi · ‎01-25-2016

Hmmmm weird, the order shouldn't really make a difference. I assume he added a reducer doing that. Only explanation I have. Adding a distribute by would most likely also have helped. But sort is good for predicate pushdown and so as long as all is good ... 🙂

bleonhardi · ‎01-15-2016

Its /usr/hdp<version_number>

bleonhardi · ‎01-14-2016

Apart from apreduce.reduce.java.opts=-Xmx4096m missing an m which I don't think will be the problem; How many days are you loading? You essentially do a dynamic partitioning so the task needs to keep memory for every day you load into. If you have a lot of days this might be the reason: Possible solutions: a) Try to load one day and see if that makes it better. b) use dynamic sorted partitioning, ( slide 16) this theoretically should fix the problem if this is the reason c) use manual distribution ( slide 19 ) http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

bleonhardi · ‎01-13-2016

That is very curious I have seen lots of stripes being created because of memory problems. But normally he only gets down to 5000 rows and then out of memory. Which version of Hive are you using? What are your memory settings for the hive tasks and if the file is small is it possible that the table is partitioned and the task is writing into a large number of partitions at the same time? Can you share the LOAD command and the table layout?

bleonhardi · ‎01-11-2016

ah nice undercover magic. I will try and see what happens if I switch the active off.

bleonhardi · ‎01-11-2016

I have seen the question for HA Namenodes however HA Resource Managers still confuse me. In Hue you are for example told to add a second resource manager entry with the same logical hue name. I.e. Hue supports adding two resource manager urls and he will manually try both. How does that work in Falcon, how can I enter an HA Resource Manager entry into the interfaces of the cluster Entity document. For Namenode HA I would use the logical name and the program would then read the hdfs-site.xml I have seen the other similar questions for oozie but I am not sure it was answered or I didn't really understand it. https://community.hortonworks.com/questions/2740/what-value-should-i-use-for-jobtracker-for-resourc.html so assuming my active resource manager is mycluster1.com:8050 and standby is mycluster2,com:8050

bleonhardi · ‎01-07-2016

You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems. Here is the oozie shell command by the way. <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>runJavaCommand.sh</exec> <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file> <file>${nameNode}/securelocation/user.keytab#user.keytab</file> </shell> then just add a kinit into the script before running java kinit -kt user.keytab user@EXAMPLE.COM java org.apache.myprogram

bleonhardi · ‎01-04-2016

It looks like a very useful command for debugging. Never used it before. Shame it seems to be broken.

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: TezSessionManager - Exception while waiting fo...

Re: TezSessionManager - Exception while waiting fo...

Re: ORC Stripe size

Re: Connecting Eclipse To Hive

Re: ORC Stripe size

Re: ORC Stripe size

Re: Falcon with HA Resource Manager

Falcon with HA Resource Manager

Re: Access to Hive from Oozie java action with Ker...

Re: Field with empty or no data causing error in p...