- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-03-2016 08:08 PM - edited 08-17-2019 08:33 AM
Starting My Hadoop Tools
NiFi can interface directly with Hive, HDFS, HBase, Flume and Phoenix. And I can also trigger Spark and Flink through Kafka and Site-To-Site. Sometimes I need to run some Pig scripts. Apache Pig is very stable and has a lot of functions and tools that make for some smart processing. You can easily augment and add this piece to a larger pipeline or part of the process.
Pig Setup
I like to use Ambari to install the HDP 2.5 clients on my NiFi box to have access to all the tools I may need.
Then I can just do:
yum install pig
Pig to Apache NiFi 1.0.0
ExecuteProcess
We call a shell script that wraps the Pig script.
Output of script is stored to HDFS: hdfs dfs -ls /nifi-logs
Shell Script
export JAVA_HOME=/opt/jdk1.8.0_101/ pig -x local -l /tmp/pig.log -f /opt/demo/pigscripts/test.pig
You can run in different Pig modes like local, mapreduce and tez. You can also pass in parameters or the script.
Pig Script
messages = LOAD '/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log'; warns = FILTER messages BY $0 MATCHES '.*WARN+.*'; DUMP warns store warns into 'warns.out'
This is a basic example from the internet, with the NIFI 1.0 log used as the source.
As an aside, I run a daily script with the schedule 1 * * * * ? to clean up my logs.
Simply: /bin/rm -rf /opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/*2016*
PutHDFS
Hadoop Configuration: /etc/hadoop/conf/core-site.xml
Pick a directory and store away.
Results
HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.3.2.5.0.0-12450.16.0.2.5.0.0-1245root2016-11-03 19:53:572016-11-03 19:53:59FILTER Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTimeAvgMapTimeMedianMapTimeMaxReduceTimeMinReduceTimeAvgReduceTimeMedianReducetimeAliasFeatureOutputs job_local72884441_000110n/an/an/an/a0000messages,warnsMAP_ONLYfile:/tmp/temp1540654561/tmp-600070101, Input(s): Successfully read 30469 records from: "/opt/demo/HDF/centos7/tars/nifi/nifi-1.0.0.2.0.0.0-579/logs/nifi-app.log" Output(s): Successfully stored 1347 records in: "file:/tmp/temp1540654561/tmp-600070101" Counters: Total records written : 1347 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local72884441_0001
Reference:
- http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#...
- http://hortonworks.com/apache/pig/#section_2
- http://hortonworks.com/blog/jsonize-anything-in-pig-with-tojson/
- https://github.com/dbist/pig
- https://github.com/sudar/pig-samples
- http://hortonworks.com/hadoop-tutorial/how-to-use-basic-pig-commands/
- http://hadooptutorial.info/built-in-load-store-functions-in-pig/
- https://cwiki.apache.org/confluence/display/PIG/PigTutorial
- https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/validat...
- http://pig.apache.org/docs/r0.16.0/start.html
- http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig
- https://github.com/alanfgates/programmingpig/tree/master/examples/ch2