About pminovic

pminovic · ‎02-11-2016

Hi @Herman Yu, on my Sandbox, also HDP-2.3.2 it works, I only changed table name: hive> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", "isEmpty"), reflect("java.lang.Math", "max", 2, 3), reflect("java.lang.Math", "min", 2, 3), reflect("java.lang.Math", "round", 2.5), reflect("java.lang.Math", "exp", 1.0), reflect("java.lang.Math", "floor", 1.9) FROM st2 limit 1; OK 1 true 3 2 3 2.718281828459045 1.0 Time taken: 1.97 seconds, Fetched: 1 row(s) Do other Hive commands and scripts work? Can you try reflect one by one, or, if you copy/pasted the command from somewhere try typing the first reflect.

pminovic · ‎02-11-2016

@Swapnil Prabhu Do you have /user/admin directory created in HDFS and with right permissions? Try su - hdfs hdfs dfs -ls -d /user/admin hdfs dfs -mkdir /user/admin ... if /user/admin doesn't exist If admin is not the owner of the directory try hdfs dfs -chown -R admin:hdfs /user/admin and then retry your Hive view action.

pminovic · ‎02-11-2016

Hi @Krishna Srinivas, Using multiple mappers is good practice also for free-form queries, however you have to keep in mind what's your free-form query doing. Each mapper will run a copy of the query with additional WHERE conditions to split the table based on the "--split-by" column. So, in your case each mapper will return 100k records per split, for the total of 400k. If you want 100k per table then you should set use "TOP 25000 ..." For the majority of free-form queries like "WHERE a>100 and b>300" you don't have to worry about the number of records.

pminovic · ‎02-10-2016

I just completed my first Express upgrade (EU) using Ambari-2.2.0, from HDP-2.2.8 to HDP-2.3.4 and here are my observations and issues I encountered. The cluster has 12 nodes, 2 masters and 10 workers with configured Namenode HA and RM HA running on RHEL-6.5 using Java-7. Installed Hadoop components: HDFS, MR2, Yarn, Hive, Tez, HBase, Pig, Sqoop, Oozie, ZooKeeper, and AmbariMetrics. About 2 weeks before this EU, the cluster was upgraded from HDP-2.1.10 and Ambari-1.7.1. Please use this as a reference: based on cluster settings and previous history (previous upgrade or fresh install), the issues will differ, and the problems I had should by no means considered to be representative, and taking place during every EU. It's good idea to backup all cluster supporting data-bases in advance, in my case Ambari, Hive metastore, Oozie and Hue (although Hue cannot be upgraded by Ambari) There is no need to prepare or download HDP.repo file in advance, Ambari will crate the file, now called HDP-2.3.4.repo and will distribute it to all nodes. The upgrade consists of registering a new HDP version, installing that new version on all nodes, and after that starting the upgrade. After starting the upgrade Ambari found that we can also do Rolling upgrade by enabling yarn.timeline-service.recovery.enabled (now false), but instead we decided to do the Express Upgrade (EU). There was only one warning for EU, that some *-env.sh files will be over-written. That was fine, however I backed up all those files for easier comparison with new files after the upgrade. The upgrade started well, and everything was looking great: ZooKeeper, HDFS Name nodes and Data nodes, and Resource managers were all successfully upgraded and restarted. And then, when it looked like it would be effortless on my part, there was the first set-back: Node managers, all 6 of them could not start after the upgrade. Before starting the upgrade I chose to ignore all failures (on both master and worker components) and decided to keep on going and fix NMs later. Upgrade and restart of MapReduce2 and HBase was successful, and then the upgrade wizard tried to do service checks of components upgraded up to that point. As expected, ZK, HDFS and HBase were successful, but Yarn and MR2 tests failed. At that point I decided to see can I fix NMs. A cool feature of EU is that one can pause the upgrade at any time, inspect Ambari dashboard, do manual fixes and restart the EU when ready. Back to failed NMs, the wizard log was just saying (for every NM) that it cannot find it in the list of started NMs which was not very helpful. So, I checked the log on one of NMs, it was saying: 2016-02-05 13:16:52,503 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(540)) - Error starting NodeManagerorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/000035.sst And indeed, in that directory I had 000040.sst but sadly no 000035.sst. I realized that it is my yarn.nodemanager.recovery.dir and because my Yarn NM recovery was enabled, NM tried to recover its state to the one before it was stopped. All our jobs were stopped and we didn't mind about recovering NM states, so after backing up the directory I decided to delete all files in it, and try to start NM manually. Luckily, that worked! The command to start a NM manually, as done by Ambari, as yarn user: $ ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh --config /usr/hdp/current/hadoop-client/conf start nodemanager After that EU was smooth, it upgraded Hive, Oozie, Pig, Sqoop, Tez and passed all service checks. At the very end, one can finalize the upgrade, or "Finalize later". I decided to finalize later and inspect the cluster. I noticed that ZKFC are still running on old version 2.2.8 and tried to restart HDFS hoping that ZKFC will be started using the new version. They didn't and on top of that I couldn't start NNs! I realized that because HDFS upgrade was not finalized I need the "-rollingUpgarde started" flag, so I started NNs manually, as hdfs user (Note: this is only required if you want to restart NNs before finalizing the upgrade): $ ulimit -c unlimited; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode -rollingUpgrade started After finalizing the upgrade and restarting HDFS, everything was running on HDP new version. In addition, I did following check to make sure the old version is not used any more: hdp-select status | grep 2\.2\.8 returns nothing ls -l /usr/hdp/current | grep 2\.2\.8 returns nothing ps -ef | grep java | grep 2\.2\.8 returns nothing or something not related to HDP After finalizing the upgrade Oozie service check was failing. I realized that Oozie share lib in HDFS is now in /user/oozie/share/lib_20160205182129, where the date/time in the directory name is derived from the time of creation. However, permissions were insufficient, all jars had 644 permissions instead of 755. So, as hdfs user I changed permissions and after that Oozie service check was all right: $ hdfs dfs -chmod -R 755 /user/oozie/share/lib_20160205182129 Pig service check was also failing. I found that pig-env.sh was wrong still having HCAT_HOME, HIVE_HOME, PIG_CLASSPATH and PIG_OPTS pointing to jars in now non-existent /usr/lib/hive and /usr/lib/hive-catalog directories. I commented out everything leaving only: JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 HADOOP_HOME=${HADOOP_HOME:-/usr} if [ -d "/usr/lib/tez" ]; then PIG_OPTS="$PIG_OPTS -Dmapreduce.framework.name=yarn" fi Fixed templeton.libjars which got scrambled during the upgrade templeton.libjars=/usr/hdp/${hdp.version}/zookeeper/zookeeper.jar,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar At this point all service checks were successful, and additional tests running Pi, Teragen/Terasort, simple Hive and Pig jobs were completing without issues. And so, my first EU was over! Despite these minor setbacks it was much faster than doing it all manually. Give it a try when you have a chance

pminovic · ‎02-09-2016

Hi @Krishna Srinivas, that's correct. See this for a workaround to do incremental import into a Hive external table by way of HDFS.

pminovic · ‎02-09-2016

If your transport is "binary" then your issue is not related to transport mode. "http" mode, required for example by Knox is not supported by Ambari versions older than 2.2.0. By the way, what's your version of Ambari? I found a related question but unresolved. However, I haven't seen this in Ambari-2.1.2 and higher. Also, can you post the screen-shot of your Ambari view setup and your Hive thrift port.

pminovic · ‎02-09-2016

Hi @J. David, not sure but looks like something related to your Hive transport setting. What's your hive.server2.transport.mode? All versions of Ambari Hive views support binary mode, but only Ambari-2.2 and higher support http mode.

pminovic · ‎02-08-2016

Hi Ben, I'm not an Oozie expert buy you look lonely on this page, nobody to help you 🙂 So, the simplest solution is to have a special queue for Oozie apps and set maximum-capacity=capacity and preemption will not happen even if enabled. But you can say I'm avoiding the question 🙂 Otherwise, I think in all cases (A) - (D), if preemption happens respective tasks will be retried, but then we can have some orphaned tasks. So, we'd like to avoid in particular A, B and C. A and C are AMs which can be controlled by maximum-am-resource-percent per queue, and the Scheduler is not supposed to kill AMs I guess. Then only B remains exposed, and by doing some search I found that we can run Oozie action launcher in uber mode so that A and B run together, the property is called oozie.action.launcher.mapreduce.job.ubertask.enable which by default is false (in spite of this jira) and can be set per action. HTH.

pminovic · ‎02-07-2016

Since Oozie is stopped you can use ij, Derby interactive sql tool.

pminovic · ‎02-07-2016

Okay, as root do this su -l hdfs -c "hdfs dfs -chown -R ec2-user:hdfs /user/ec2-user" And retry first from cmdline as ec2-user, and if it works try the view again, logging to Ambari as ec2-user. Edit: Sorry, I forgot "-c". The problem is that ec2-user doesn't have permissions to his home directory in hdfs.

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: Hive reflect function is not allowed

Re: Create DB in Hive

Re: sqoop free form query import to hbase having i...

My First Express Upgrade

Re: Using sqoop incremental import to a hive table...

Re: Hive View Suddenly Failing with H100 Unable to...

Re: Hive View Suddenly Failing with H100 Unable to...

Re: Yarn preemption and Oozie

Re: How to export data from Oozie Derby database?

Re: hive view not exploring default databases ?