Created 11-21-2017 04:33 AM
Hi!
I am trying to create a simple workflow with a hive action. I'm using Cloudera Quickstart VM (CDH 5.12). The following are the components of my workflow:
1) top_n_products.hql
create table instacart.top_n as ( select * from ( select row_number() over (order by no_of_times_ordered desc)as num_rank, product_id, product_name, no_of_times_ordered from ( select A.product_id, B.product_name, count(*) as no_of_times_ordered from instacart.order_products__train as A left outer join instacart.products as B on A.product_id=B.product_id group by A.product_id, B.product_name )C )D where num_rank <= ${N} );
2) hive-config.xml
I have basically copied the hive-site.xml from /etc/hive/conf into my workflow workspace folder and renamed it to hive-config.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files --> <!-- that are implied by Hadoop setup variables. --> <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive --> <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized --> <!-- resource). --> <!-- Hive Execution Parameters --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>cloudera</value> </property> <property> <name>hive.hwi.war.file</name> <value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value> <description>This is the WAR file with the jsp content for Hive Web Interface</description> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://127.0.0.1:9083</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property> </configuration>
3) Workflow properties
In the hive action, I set the following:
- set HIVE XML, Job XML paths to my hive-config.xml
- Also added hive-config.xml to Files
- In the workflow properties, set the path to my workspace
- Defined the parameter N in my query
When I try to run te workflow it fails, and the stderr throws following error:
Log Type: stderr Log Upload Time: Mon Nov 20 19:49:04 -0800 2017 Log Length: 2759 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/130/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Nov 20, 2017 7:47:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information. Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class Nov 20, 2017 7:47:35 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Nov 20, 2017 7:47:36 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Nov 20, 2017 7:47:38 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest" log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Below are the workflow.xml and job.properties that are generated:
1) Workflow XML:
<workflow-app name="Top_N_Products" xmlns="uri:oozie:workflow:0.5"> <global> <job-xml>hive-config.xml</job-xml> </global> <start to="hive-87ac"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="hive-87ac" cred="hcat"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>hive-config.xml</job-xml> <script>top_n_products.hql</script> <param>N={N}</param> <file>hive-config.xml#hive-config.xml</file> </hive> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
2) job.properties
security_enabled=False send_email=False dryrun=False nameNode=hdfs://quickstart.cloudera:8020 jobTracker=localhost:8032 N=10
Please note that the hive query runs perfectly fine through the Hive query editor. Am I missing something while configuring the workflow? Any help is appreciated!
Thanks,
Deb