About webbd

webbd · ‎11-09-2021

Installing and configuring Livy on CDH 6.x.x Livy is a preferred way to run Spark jobs on several Hadoop installations, but not on CDH. While preparing for a CDP migration, one of our use-cases switched to Apache Airflow to run jobs without requiring an edge node or "bastion node" and they wanted to begin using Airflow before the CDP migration, so they asked me to install Livy on a CDH edge node. A search online for Livy on CDH returned little helpful information, but I did find information on how to download and install it at https://livy.apache.org/ Step 1: Determine which account will be used to run Livy Linux security will allow an application to access or execute any program or file the executing account can access unless you configure selinux or another access management software. Pick an account to run Livy from. Step 2: Set up a keytab You'll need a Kerberos principal. If you use Active Directory principals with your CDH deployment, then this account will be outside of your Hadoop platform. You can use a tool like ktutil to create a keytab for your Kerberos principal. Step 3: Set up your server to run Livy Livy requires the basic Hadoop and Spark environment variables. export JAVA_HOME=/usr/java/default/jre export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop export HADOOP_CONF_DIR=/etc/hadoop/conf export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark Step 4: Download and install Livy Download the Livy package zip file from https://livy.apache.org/download/ using wget. Example: cd /var/tmp wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip Unzip the resulting zip file unzip apache-livy-0.7.1-incubating-bin.zip Deploy the package and link the default symlink directory mkdir /opt/livy mv /var/tmp/apache-livy-0.7.1-incubating-bin /opt/livy/ ln -s /opt/livy/apache-livy-0.7.1-incubating-bin /opt/livy/default Step 5: Set up livy.conf Livy places template files that you'll need to copy to "real" files. You need to configure livy.conf. cp /opt/livy/default/conf/livy.conf.template /opt/livy/default/conf/livy.conf Edit the newly created livy.conf file and add two lines in the commented-out Kerberos section. livy.server.launch.kerberos.principal=${KERBEROS_PRINCIPAL} livy.server.launch.kerberos.keytab=${KERBEROS KEYTAB} Specify the full Kerberos principal name and the full path to the keytab. Step 6: Run the Livy server Livy server runs as a background process. This article doesn't discuss how to run it as a service that starts automatically. /opt/livy/default/bin/livy-server start Step 7: Test Livy You can use one of the recommended test commands from another node: curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" http://<LIVY_HOST>:8998/sessions You can also test from a web browser: http://<LIVY_HOST>:8998 Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and may be very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.

webbd · ‎11-07-2019

I'm currently running CDH 5.15.0.

webbd · ‎11-04-2019

I'm running a cluster with 8 worker nodes configured with 160GB Impala Daemon Memory Limit. The worker nodes each have 370GB RAM and based on a look at the standard Host Memory Usage graph from Cloudera Manager for the nodes, it looks like I have capacity for additional query space. My question: Does it look like I have room to increase my Impala values to meet my needs? From my viewpoint, I think I have at least another 100GB of headroom, but I don't want to impact Hive or Spark processing that may occur during the same time windows. I'd like to accomplish the following: I'd like to allow some queries that tend to overreach on Impala RAM additional capacity to do what they need to do. These queries read some big tables, sometimes with thousands of partitions, and they have a tendency to run out of RAM. Reduce Impala usage of scratch directories on these large queries. My cluster is storage-constrained, so when Impala goes heavy into the scratch directories, it not only takes a long time for the queries to finish, but the cluster's health starts to show issues. Currently, I don't have any admission control settings enabled. Any query can use all the available resources. I'd like to increase the available RAM for all of Impala while limiting the RAM for individual queries. Over the past week, The nodes' host memory usage graph contains the following example peaks: Peak 1: Physical Memory Buffers: 2.6G Physical Memory Caches: 203.5G Physical Memory Capacity: 370G Physical Memory Used: 172G Swap Used: 0K Peak 2: Physical Memory Buffers: 2.6G Physical Memory Caches: 135G Physical Memory Capacity: 370G Physical Memory Used: 232G Swap Used: 768K During a quiet time, the numbers look like: Physical Memory Buffers: 70.7M Physical Memory Caches: 2.8gG Physical Memory Capacity: 370G Physical Memory Used: 19.3G Swap Used: 768K

webbd · ‎07-06-2017

I'm reading through the installation instructions for CDSW at https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html and found the section to set up a Wildcard DNS domain. In this section, it references a master IP address. I need to understand if this master IP address is the IP of the Cloudera Manager master node, or if it is the IP of the CDSW master. I'm assuming it's the CDSW master, but I don't want to set up a Wildcard DNS domain incorrectly. Thanks, David Webb

webbd · ‎05-24-2016

That fixed it. I used alternatives to install a new alternative to javac, then used it again to configure javac to the new alternative. alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_67-cloudera/bin/javac 1 # alternatives --config javac There are 2 programs which provide 'javac'. Selection Command ----------------------------------------------- * 1 /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/javac + 2 /usr/java/jdk1.7.0_67-cloudera/bin/javac Enter to keep the current selection[+], or type selection number: 2 Thanks for your help! DaveW

webbd · ‎05-24-2016

Sorry, # javac -version javac 1.6.0_38

webbd · ‎05-24-2016

Sure. # java -version java version "1.7.0_95" OpenJDK Runtime Environment (rhel-2.6.4.0.el6_7-x86_64 u95-b00) OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

webbd · ‎05-23-2016

I'm new to maven, so when I ran mvn install from /root, I got an error: [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.130s [INFO] Finished at: Mon May 23 17:00:31 EDT 2016 [INFO] Final Memory: 7M/964M [INFO] ------------------------------------------------------------------------ [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/root). Please verify you invoked Maven from the correct directory. -> [Help 1] The POM file was in the /tmp/cm_ext/validator directory. [root ~]# cd /tmp/cm_ext/validator [root validator]# ll total 5048 -rw-r--r--. 1 root root 5144659 Feb 19 2013 apache-maven-3.0.5-bin.tar.gz -rw-r--r--. 1 root root 9409 May 23 16:58 pom.xml -rw-r--r--. 1 root root 476 May 19 14:07 README.md drwxr-xr-x. 5 root root 4096 May 19 14:07 src I was able to biuld the cm-schema package as suggested. Once I did, I got a new error when I tried to build validator: [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /tmp/cm_ext/validator/src/main/java/com/cloudera/cli/validator/ApplicationConfiguration.java:[30,20] package java.nio.file does not exist [ERROR] /tmp/cm_ext/validator/src/main/java/com/cloudera/cli/validator/ApplicationConfiguration.java:[31,20] package java.nio.file does not exist [ERROR] /tmp/cm_ext/validator/src/main/java/com/cloudera/cli/validator/ApplicationConfiguration.java:[72,18] cannot find symbol symbol : variable Paths location: class com.cloudera.cli.validator.ApplicationConfiguration [ERROR] /tmp/cm_ext/validator/src/main/java/com/cloudera/cli/validator/ApplicationConfiguration.java:[71,14] cannot find symbol symbol : variable Files location: class com.cloudera.cli.validator.ApplicationConfiguration [INFO] 4 errors [ I assume that I need to install java.nio.file or to make sure that it's in a path that can be accessed. I'll look into this, but if anyone has any clues that I can follow, they will be greatly appreciated. Thanks, - DaveW

webbd · ‎05-19-2016

I've installed apache nifi on one of my CDH 5.7 clusters (linux version CentOS 6.7), but I'd like to manage it from within Cloudera Manger. I did some research on parcels and on CSDs. It looks like this is something I can do, and it doesn't look like it should be too difficult. I came across the github page https://github.com/prateek/nifi-parcel, which gives step-by-step instructions for creating a nifi parcel for Cloudera. Unfortunately, I'm running into errors. The steps instruct me to execute the command to download cloudera/cm_ext and then build it. cd /tmp git clone https://github.com/cloudera/cm_ext cd cm_ext/validator mvn install When I execute maven to install the validator, I ran into a build failure. [WARNING] The POM for com.cloudera.cmf.schema:cloudera-manager-schema:jar:5.5.0 is missing, no dependency information available [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE I assumed that maybe 5.5.0 stands for CDH 5.5.0, so I updated the pom.xml to 5.7.0. Downloading: http://repo.maven.apache.org/maven2/com/cloudera/cmf/schema/cloudera-manager-schema/5.7.0/cloudera-manager-schema-5.7.0.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE ... [ERROR] Failed to execute goal on project schema-validator: Could not resolve dependencies for project com.cloudera.enterprise:schema-validator:jar:5.7.0: Could not find artifact com.cloudera.cmf.schema:cloudera-manager-schema:jar:5.7.0 in cloudera-external (https://repository.cloudera.com/artifactory/ext-release-local/) -> [Help 1] I searched https://repository.cloudera.com/artifactory/ext-release-local/ and found that there's nothing there under the ./com/cloudera directory. Is there a better way to do this?

webbd · ‎12-15-2015

I was able to rebuild the Oozie job and make it work, although I really don't know what is different. I built the job in sequence this time, so that the steps are listed in-sequence in the XML file. I also built the job steps to reference the lib directory in the job's path. I had previously had success with explicit references, but these didn't seem necessary. I moved the prepare steps to a point right before they were needed instead of all on the first step. I eliminated the output directory definition for TeraValidate because it doesn't seem to be used. Finally, I let Hue/Oozie choose the defaults for Master and Mode. I played around with trying to use YARN and cluster, but these didn't work. My resulting XML (that works) looks like this: <workflow-app name="TeraGen-TeraSort-TeraValidate" xmlns="uri:oozie:workflow:0.5"> <start to="spark-27f0"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="spark-27f0"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/davidw/terasort-benchmark.in"/> </prepare> <master>local[*]</master> <mode>client</mode> <name>TeraGen</name> <class>com.github.ehiggs.spark.terasort.TeraGen</class> <jar>lib/spark-terasort.jar</jar> <arg>1g</arg> <arg>/user/davidw/terasort-benchmark.in</arg> </spark> <ok to="spark-94fc"/> <error to="Kill"/> </action> <action name="spark-94fc"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/> </prepare> <master>local[*]</master> <mode>client</mode> <name>TeraSort</name> <class>com.github.ehiggs.spark.terasort.TeraSort</class> <jar>lib/spark-terasort.jar</jar> <arg>/user/davidw/terasort-benchmark.in</arg> <arg>/user/davidw/terasort-benchmark.out</arg> </spark> <ok to="spark-bcf9"/> <error to="Kill"/> </action> <action name="spark-bcf9"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>local[*]</master> <mode>client</mode> <name>TeraValidate</name> <class>com.github.ehiggs.spark.terasort.TeraValidate</class> <jar>lib/spark-terasort.jar</jar> <arg>/user/davidw/terasort-benchmark.out</arg> </spark> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>

Online	Offline
Last Visited	‎11-22-2021 11:47 AM

Member Since	‎12-15-2015 05:33 AM
Last Visited	‎11-22-2021 11:47 AM
Posts	15
Kudos received	1

Cloudera Community

Re: Error "tried to access method com.google.commo...

Livy server on CDH 6.x.x

Re: Impala Daemon Memory Limit

Impala Daemon Memory Limit

Data Science Workbench installation instructions -...

Re: Creating a CSD / Parcel for Nifi - CDH 5.7.0

Re: Creating a CSD / Parcel for Nifi - CDH 5.7.0

Re: Creating a CSD / Parcel for Nifi - CDH 5.7.0

Re: Creating a CSD / Parcel for Nifi - CDH 5.7.0

Creating a CSD / Parcel for Nifi - CDH 5.7.0

Re: Error "tried to access method com.google.commo...