Support Questions

Find answers, ask questions, and share your expertise

How do I set up Pig to use HCatalog?

avatar
Contributor

I want Pig to use HCatalog in my Oozie workflow

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Here's another possibility, from the HDP 2.4.2 Release Notes:

Configuring Pig Scripts to Use HCatalog in Oozie Workflows

To access HCatalog with a Pig action in an Oozie workflow, you need to modify configuration information to point to the Hive metastore URIs.

There are two methods for providing this configuration information. Which method you use depends upon how often your Pig scripts access the HCatalog.

Configuring Individual Pig Actions to Access HCatalog

If only a few individual Pig actions access HCatalog, do the following:

  1. Identify the URI (host and port) for the Thrift metastore server.

a. In Ambari, click Hive > Configs > Advanced.

b. Make note of the URI in the hive.metastore.uris field in the General section.

This information is also stored in the hive.default.xml file.

2. Add the following two properties to the <configuration> elements in each Pig action.

Note: Replace [host:port(default:9083)] in the example below with the host and port for the Thrift metastore server.

<configuration>
    <property>
	<name>hive.metastore.uris</name>
        <value>thrift://[host:port(default:9083)]</value>
	<description>A comma separated list of metastore uris the client can use to contact the metastore server.</description>
    </property>
    <property>
	<name>oozie.action.sharelib.for.pig</name>       
	<value>pig,hive,hcatalog</value>    
	<description>A comma separated list of libraries to be used by the Pig action.</description>
    </property>
</configuration> 

Configuring All Pig Actions to Access HCatalog

If all of your Pig actions access HCatalog, do the following:

  1. Add the following line to the job.properties files, located in your working directory:

oozie.action.sharelib.for.pig=pig,hive,hcatalog

<!-- A comma separated list of libraries to be used by the Pig action.-->

2. Identify the URI (host and port) for the Thrift metastore server.

a. In Ambari, click Hive > Configs > Advanced.

b. Make note of the URI in the hive.metastore.uris field in the General section.

This information is also stored in the hive.default.xml file.

3. Add the following property to the <configuration> elements in each Pig action.

Note: Replace [host:port(default:9083)] in the example below with the host and port for the Thrift metastore server.

<configuration>
    <property>
	<name>hive.metastore.uris</name> 
	<value>thrift://[host:port(default:9083)]</value>
	<description>A comma separated list of metastore uris the client can use to contact the
metastore server.</description>
    </property>
</configuration>

View solution in original post

2 REPLIES 2

avatar
Master Mentor

avatar
Expert Contributor

Here's another possibility, from the HDP 2.4.2 Release Notes:

Configuring Pig Scripts to Use HCatalog in Oozie Workflows

To access HCatalog with a Pig action in an Oozie workflow, you need to modify configuration information to point to the Hive metastore URIs.

There are two methods for providing this configuration information. Which method you use depends upon how often your Pig scripts access the HCatalog.

Configuring Individual Pig Actions to Access HCatalog

If only a few individual Pig actions access HCatalog, do the following:

  1. Identify the URI (host and port) for the Thrift metastore server.

a. In Ambari, click Hive > Configs > Advanced.

b. Make note of the URI in the hive.metastore.uris field in the General section.

This information is also stored in the hive.default.xml file.

2. Add the following two properties to the <configuration> elements in each Pig action.

Note: Replace [host:port(default:9083)] in the example below with the host and port for the Thrift metastore server.

<configuration>
    <property>
	<name>hive.metastore.uris</name>
        <value>thrift://[host:port(default:9083)]</value>
	<description>A comma separated list of metastore uris the client can use to contact the metastore server.</description>
    </property>
    <property>
	<name>oozie.action.sharelib.for.pig</name>       
	<value>pig,hive,hcatalog</value>    
	<description>A comma separated list of libraries to be used by the Pig action.</description>
    </property>
</configuration> 

Configuring All Pig Actions to Access HCatalog

If all of your Pig actions access HCatalog, do the following:

  1. Add the following line to the job.properties files, located in your working directory:

oozie.action.sharelib.for.pig=pig,hive,hcatalog

<!-- A comma separated list of libraries to be used by the Pig action.-->

2. Identify the URI (host and port) for the Thrift metastore server.

a. In Ambari, click Hive > Configs > Advanced.

b. Make note of the URI in the hive.metastore.uris field in the General section.

This information is also stored in the hive.default.xml file.

3. Add the following property to the <configuration> elements in each Pig action.

Note: Replace [host:port(default:9083)] in the example below with the host and port for the Thrift metastore server.

<configuration>
    <property>
	<name>hive.metastore.uris</name> 
	<value>thrift://[host:port(default:9083)]</value>
	<description>A comma separated list of metastore uris the client can use to contact the
metastore server.</description>
    </property>
</configuration>