Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Super Guru

Starting My Hadoop Tools

NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process.

Pig Setup

I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need.

Then I can just do:

yum install pig

Pig to Apache NiFi 1.0.0

9098-pig1.png

9099-pig2.png

ExecuteProcess

We call a shell script that wraps the Pig script.

Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs

Shell Script

export JAVA_HOME=/opt/jdk1.8.0_101/ 
pig -x local -l /tmp/pig.log -f /opt/demo/pigscripts/test.pig

You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script.

Pig Script

messages = LOAD '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log';
warns = FILTER messages BY $0 MATCHES '.*WARN+.*';
DUMP warns
store warns into 'warns.out'

This is a basic example from the internet, with the NIFI 1.0 log used as the source.

As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs.

Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016*

PutHDFS

Hadoop Configuration: /etc/hadoop/conf/core-site.xml

Pick a directory and store away.

Results

HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.3.2.5.0.0-12450.16.0.2.5.0.0-1245root2016-11-03 19:53:572016-11-03 19:53:59FILTER
Success!
Job Stats (time in seconds):
JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs
job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp-600070101,
Input(s):
Successfully read 30469 records from: "/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log"
Output(s):
Successfully stored 1347 records in: "file:/tmp/temp1540654561/tmp-600070101"
Counters:
Total records written : 1347
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local72884441_0001

Reference:


pig1.png
1,331 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 08:33 AM
Updated by:
 
Contributors
Top Kudoed Authors