I'am very newer to Big Data. I try to understand some basic concepts in Pig and i need some clarification about transfermation from local to HDFS for analytics
1. I have Excel files in my local directory Ex. /bdata
2. for data transformation using the command hadoop dfs –copyFromlocal /bdata hdfs://192.168.1.xxx:8020/hbdata
3. After files moved to HDFS.
You have two modes for running these commands:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.2"> ... <action name="[NODE-NAME]"> <pig> <job-tracker>[JOB-TRACKER]</job-tracker> <name-node>[NAME-NODE]</name-node> <prepare> ... </prepare> <script>[PIG-SCRIPT]</script> <param>[PARAM-VALUE]</param> ... </pig> <ok to="[NODE-NAME]"/> <error to="[NODE-NAME]"/> </action> ... </workflow-app>
@Iyappan Gopalakrishnan you can schedule your pig job through Falcon. It will help you to handle entire data pipeline management. It supports scheduling pig/hive/oozie workflows and comes with email and SNMP notification.
Thanks @Abdelkrim Hadjidj its working fine for mail.
some of the shell scripts not accepting in to the pig scripts for example
i need to compare and then only copy to the file from local to hdfs so use shell scripts like
sh for i in ‘cat compare.txt’ ; do hadoop dfs –copyFromlocal Bdata1/$i hdfs://192.168.1.xxx:8020/hbdata
Please suggest me how to write scripts and complete the full flow.
Note:I'm not able to find directly compare local and HDFS directories so i use more commands achieve comparison