Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Contributor

Objective

Use Oozie's Hive 2 Action to create a workflow which will connect to Hive Serve 2 in a Kerberized environment. Execute a Hive query script which will sum the number of crimes in the crimes database table for a particular year - passed in as a parameter queryYear in thejob.properties file. Write the results to a new Hive table - crimenumbers

Procedure

Log into the edge server containing the Oozie client. Change users to the oozie user.

[root@jyoung-hdp234-1 ~]# su - oozie

Authenticate to the KDC using the oozie service account kerberos keytab

[oozie@jyoung-hdp234-1 ~]$ kinit -kt /etc/security/keytabs/oozie.service.keytab oozie/jyoung-hdp234-1.openstacklocal@EXAMPLE.COM

Download the City of Chicago crime data in CSV form.

[oozie@jyoung-hdp234-1 ~]$ mkdir -p /tmp/crime
[oozie@jyoung-hdp234-1 ~]$ cd /tmp/crime
[oozie@jyoung-hdp234-1 crime]$ curl -o crime -L https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD

Put the crime csv into HDFS into the /tmp/crime folder.

[oozie@jyoung-hdp234-1 crime]$ hdfs dfs -mkdir -p /tmp/crime
[oozie@jyoung-hdp234-1 crime]$ hdfs dfs -copyFromLocal crime /tmp/crime/
[hdfs@jyoung-hdp234-1 tmp]$ hdfs dfs -chmod -R 777 /tmp/crime
[oozie@jyoung-hdp234-1 crime]$ hdfs dfs -chmod -R 777 /tmp/crime
[oozie@jyoung-hdp234-1 crime]$ hdfs dfs -ls /tmp/crime
Found 1 items
drwxrwxrwx   - oozie hdfs          0 2016-12-19 08:32 /tmp/crime/crime

Log into the Hive server. Change users to the hive user.

[root@jyoung-hdp234-2 ~]# su - hive

Authenticate to the KDC using the hive service account kerberos keytab

[hive@jyoung-hdp234-2 ~]$ kinit -kt /etc/security/keytabs/hive.service.keytab hive/jyoung-hdp234-2.openstacklocal@EXAMPLE.COM

Create the SQL DDL script that will create the schema of the crime Hive table as an external table based on the crime csv located in HDFS.

[hive@jyoung-hdp234-2 ~]$ cat << 'EOF' > /tmp/load_crime_table.ddl
CREATE EXTERNAL TABLE IF NOT EXISTS crime(
    ID STRING,
    Case_Number STRING,
    Case_Date STRING,
    Block STRING,
    IUCR INT,
    Primary_Type STRING,
    Description STRING,
    Location_Description STRING,
    Arrest BOOLEAN,
    Domestic BOOLEAN,
    Beat STRING,
    District STRING,
    Ward STRING,
    Community_Area STRING,
    FBI_Code STRING,
    X_Coordinate INT,
    Y_Coordinate INT,
    Case_Year INT,
    Updated_On STRING,
    Latitude DOUBLE,
    Longitude DOUBLE,
    Location STRING)
COMMENT 'This is crime data for the city of Chicago.'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION '/tmp/crime'
TBLPROPERTIES("skip.header.line.count"="1");
EOF

Use beeline to execute the DDL and create the external Hive table.

[hive@jyoung-hdp234-2 ~]$ beeline -u "jdbc:hive2://jyoung-hdp234-2.openstacklocal:10000/default;principal=hive/_HOST@EXAMPLE.COM" -f "/tmp/load_crime_table.ddl"

On the Oozier server / edge node, create an Oozie workflow directory for this and future oozie demo workflows

[oozie@jyoung-hdp234-1 crime]$ hdfs dfs -mkdir -p /user/oozie/ooziedemo

Create a local folder to hold development copies of your Oozie workflow project files

[oozie@jyoung-hdp234-1 crime]$ cd ~/
[oozie@jyoung-hdp234-1 ~]$ mkdir -p ooziedemo/hivedemo/app/lib
[oozie@jyoung-hdp234-1 ~]$ cd ooziedemo/hivedemo

Create the job.properties file that will contain the configuration properties and variables for the workflow

[oozie@jyoung-hdp234-1 hivedemo]$ cat << 'EOF' > job.properties
nameNode=hdfs://jyoung-hdp234-1.openstacklocal:8020
jobTracker=jyoung-hdp234-2.openstacklocal:8050
exampleDir=${nameNode}/user/${user.name}/ooziedemo/hivedemo
oozie.wf.application.path=${exampleDir}/app
oozie.use.system.libpath=true
# Hive2 action
hivescript=${oozie.wf.application.path}/crime_per_year.hql
outputHiveDatabase=default
jdbcURL=jdbc:hive2://jyoung-hdp234-2.openstacklocal:10000/default
jdbcPrincipal=hive/_HOST@EXAMPLE.COM
queryYear=2008
EOF

Create the workflow.xml which will execute an HQL script on Hive Server 2

[oozie@jyoung-hdp234-1 hivedemo]$ cat << 'EOF' > app/workflow.xml
<workflow-app name="hivedemo" xmlns="uri:oozie:workflow:0.4">
  <global>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
  </global>
  <credentials>
    <credential name="hs2-creds" type="hive2">
      <property>
        <name>hive2.server.principal</name>
          <value>${jdbcPrincipal}</value>
      </property>
      <property>
       <name>hive2.jdbc.url</name>
         <value>${jdbcURL}</value>
      </property>
    </credential>
  </credentials>
  <start to="hive2"/>
    <action name="hive2" cred="hs2-creds">
      <hive2 xmlns="uri:oozie:hive2-action:0.1">
        <jdbc-url>${jdbcURL}</jdbc-url>
        <script>${hivescript}</script>
        <param>queryYear=${queryYear}</param>
      </hive2>
      <ok to="End"/>
      <error to="Kill"/>
    </action>
    <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="End"/>
</workflow-app>
EOF

Create the HQL script that will contain the parameterized Hive query to be executed by the workflow

[oozie@jyoung-hdp234-1 hivedemo]$ cat << 'EOF' > app/crime_per_year.hql
CREATE TABLE IF NOT EXISTS crimenumbers(year INT, number_of_crimes INT);
INSERT INTO crimenumbers SELECT ${queryYear} as year, count(*) as number_of_crimes FROM crime WHERE case_date LIKE '%${queryYear}%';
EOF

Copy the hivedemo folder to HDFS

[oozie@jyoung-hdp234-1 hivedemo]$ cd ~/ooziedemo
[oozie@jyoung-hdp234-1 ooziedemo]$ hdfs dfs -copyFromLocal hivedemo /user/oozie/ooziedemo/
[oozie@jyoung-hdp234-1 ooziedemo]$ hdfs dfs -ls -R /user/oozie/ooziedemo/
drwxr-xr-x   - oozie hdfs          0 2016-12-19 09:09 /user/oozie/ooziedemo/hivedemo
drwxr-xr-x   - oozie hdfs          0 2016-12-19 09:09 /user/oozie/ooziedemo/hivedemo/app
-rw-r--r--   3 oozie hdfs        206 2016-12-19 09:09 /user/oozie/ooziedemo/hivedemo/app/crime_per_year.hql
drwxr-xr-x   - oozie hdfs          0 2016-12-19 08:54 /user/oozie/ooziedemo/hivedemo/app/lib
-rw-r--r--   3 oozie hdfs        968 2016-12-19 09:09 /user/oozie/ooziedemo/hivedemo/app/workflow.xml
-rw-r--r--   3 oozie hdfs        452 2016-12-19 09:09 /user/oozie/ooziedemo/hivedemo/job.properties

Set and export the OOZIE_URL environment variable so that we don't have to specify -oozie http://jyoung-hdp234-1.openstacklocal:11000/oozie every time we run the oozie command.

[oozie@jyoung-hdp234-1 hivedemo]$ export OOZIE_URL=http://jyoung-hdp234-1.openstacklocal:11000/oozie

Run the oozie job

oozie@jyoung-hdp234-1 ooziedemo]$ cd hivedemo
[oozie@jyoung-hdp234-1 hivedemo]$ oozie job -run -config job.properties -verbose -debug -auth kerberos
...
job: 0000099-161213015814745-oozie-oozi-W

Watch the job info and progress

[oozie@jyoung-hdp234-1 hivedemo]$ watch -d "oozie job -info 0000099-161213015814745-oozie-oozi-W"

Verification

Check the results in HiveServer 2 via beeline

[hive@jyoung-hdp234-2 hive]$ beeline -u "jdbc:hive2://jyoung-hdp234-2.openstacklocal:10000/default;principal=hive/_HOST@EXAMPLE.COM" -e "select * from crimenumbers;"
+--------------------+--------------------------------+--+
| crimenumbers.year  | crimenumbers.number_of_crimes  |
+--------------------+--------------------------------+--+
| 2008               | 426960                         |
+--------------------+--------------------------------+--+
1 row selected (0.286 seconds)
9,458 Views
Comments
avatar

Hi log.txt

we followed the steps same but the job is getting failed.could you help us here.

Log:

avatar
Expert Contributor

I'm pretty sure this is no longer correct. Oozie supports Kerberos delegation for the Hive2 actions though I have yet to get it all working on the latest release of HDP 2.6