I am trying to follow the tutorial-100 for apache pig. When I run the script, in the results tab I do not see the output of the script and it is very hard to understand what the script is doing.
In the results I get the below
pache Pig version 0.16.0.2.6.0.3-8 (rexported) compiled Apr 01 2017, 21:50:35 USAGE: Pig [options] [-] : Run interactively in grunt shell. Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s). Pig [options] [-f[ile]] file : Run cmds found in file. options include: -4, -log4jconf - Log4j configuration file, overrides log conf -b, -brief - Brief logging (no timestamps) -c, -check - Syntax check -d, -debug - Debug level, INFO is default -e, -execute - Commands to execute (within quotes) -f, -file - Path to the script to execute -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine -h, -help - Display this message. You can specify topic to get help for that topic. properties is the only topic currently supported: -h properties. -i, -version - Display version information -l, -logfile - Path to client side log file; default is current working directory. -m, -param_file - Path to the parameter file -p, -param - Key value pair of the form param=val -r, -dryrun - Produces script with substituted parameters. Script is not executed. -t, -optimizer_off - Turn optimizations off. The following values are supported: ConstantCalculator - Calculate constants at compile time SplitFilter - Split filter conditions PushUpFilter - Filter as early as possible MergeFilter - Merge filter conditions PushDownForeachFlatten - Join or explode as late as possible LimitOptimizer - Limit as early as possible ColumnMapKeyPrune - Remove unused data AddForEach - Add ForEach to remove unneeded columns MergeForEach - Merge adjacent ForEach GroupByConstParallelSetter - Force parallel 1 for "group all" statement PartitionFilterOptimizer - Pushdown partition filter conditions to loader implementing LoadMetaData PredicatePushdownOptimizer - Pushdown filter predicates to loader implementing LoadPredicatePushDown All - Disable all optimizations All optimizations listed here are enabled by default. Optimization values are case insensitive. -v, -verbose - Print all error messages to screen -w, -warning - Turn warning logging on; also turns warning aggregation off -x, -exectype - Set execution mode: local|mapreduce|tez, default is mapreduce. -F, -stop_on_failure - Aborts execution on the first failed job; default is off -M, -no_multiquery - Turn multiquery optimization off; default is on -N, -no_fetch - Turn fetch optimization off; default is on -P, -propertyFile - Path to property file -printCmdDebug - Overrides anything else and prints the actual command used to run Pig, including any environment variables that are set by the pig command.
and under the log, I see this
WARNING: Use "yarn jar" to launch YARN applications.
17/07/07 06:16:36 INFO pig.Main: Pig script completed in 196 milliseconds (196 ms)
The script I am running is below. Please advise if the output in the results is normal. If it is normal, how can I see what the output of the script at each step. Thanks
a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader();b = FILTER a BY event != 'normal'; c = FOREACH b GENERATE driverid, event, (int)1 as occurance; d = GROUP c BY driverid; e = FOREACH d GENERATE group as driverid, sum(c.occurance) as t_occ; g = LOAD 'driver_mileage' USING org.apache.hive.hcatalog.pig.HCatLoader(); h = join e by driverid,g by driverid; dump h;
The "yarn jar" warning is nothing to worry about and the output you received suggests that you were unable to launch the script. My guess is your command-line interaction was incorrect. It should have been something like the following.
pig -useHCatalog yourscript.pig
You can see some examples of this at https://martin.atlassian.net/wiki/x/AgCfB (including running via Tez). If you are doing this, please show the exact command your ran. If running from the Ambari View, be sure to add the -useHCatalog argument as shown in Step 5.4 of https://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/.
@Lester Martin I am running the script from Ambari and I have -useHCatalog argument added. But when I run the below script
a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader(); b = FILTER a BY event != 'normal'; dump b;
Instead of getting the output of the script, I get what you can see in the attached txt file. I want to know if this is normal and how can I see what the script is doing. thanks
Gotcha; using the Ambari View. It still seems that it is not getting invoked properly. Can you provide a screenshot of the Ambari View; especially the section with the -useHCatalog argument? Did you try it with, and without, the "use Tez" checkbox selected? While this code looks good, it is often a good idea to try the code out from the CLI just to remove one less variable (again, the code looks simple and direct enough that I don't think this would provide much value other than showing you it can run).