Support Questions

Find answers, ask questions, and share your expertise

Apache PIG not displaying results

Explorer

Hello Guys,

I am trying to follow the tutorial-100 for apache pig. When I run the script, in the results tab I do not see the output of the script and it is very hard to understand what the script is doing.

In the results I get the below

pache Pig version 0.16.0.2.6.0.3-8 (rexported) 
compiled Apr 01 2017, 21:50:35
USAGE: Pig [options] [-] : Run interactively in grunt shell.
       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
       Pig [options] [-f[ile]] file : Run cmds found in file.
  options include:
    -4, -log4jconf - Log4j configuration file, overrides log conf
    -b, -brief - Brief logging (no timestamps)
    -c, -check - Syntax check
    -d, -debug - Debug level, INFO is default
    -e, -execute - Commands to execute (within quotes)
    -f, -file - Path to the script to execute
    -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
    -h, -help - Display this message. You can specify topic to get help for that topic.
        properties is the only topic currently supported: -h properties.
    -i, -version - Display version information
    -l, -logfile - Path to client side log file; default is current working directory.
    -m, -param_file - Path to the parameter file
    -p, -param - Key value pair of the form param=val
    -r, -dryrun - Produces script with substituted parameters. Script is not executed.
    -t, -optimizer_off - Turn optimizations off. The following values are supported:
            ConstantCalculator - Calculate constants at compile time
            SplitFilter - Split filter conditions
            PushUpFilter - Filter as early as possible
            MergeFilter - Merge filter conditions
            PushDownForeachFlatten - Join or explode as late as possible
            LimitOptimizer - Limit as early as possible
            ColumnMapKeyPrune - Remove unused data
            AddForEach - Add ForEach to remove unneeded columns
            MergeForEach - Merge adjacent ForEach
            GroupByConstParallelSetter - Force parallel 1 for "group all" statement
            PartitionFilterOptimizer - Pushdown partition filter conditions to loader implementing LoadMetaData
            PredicatePushdownOptimizer - Pushdown filter predicates to loader implementing LoadPredicatePushDown
            All - Disable all optimizations
        All optimizations listed here are enabled by default. Optimization values are case insensitive.
    -v, -verbose - Print all error messages to screen
    -w, -warning - Turn warning logging on; also turns warning aggregation off
    -x, -exectype - Set execution mode: local|mapreduce|tez, default is mapreduce.
    -F, -stop_on_failure - Aborts execution on the first failed job; default is off
    -M, -no_multiquery - Turn multiquery optimization off; default is on
    -N, -no_fetch - Turn fetch optimization off; default is on
    -P, -propertyFile - Path to property file
    -printCmdDebug - Overrides anything else and prints the actual command used to run Pig, including
                     any environment variables that are set by the pig command.

and under the log, I see this

 WARNING: Use "yarn jar" to launch YARN applications.
 17/07/07 06:16:36 INFO pig.Main: Pig script completed in 196 milliseconds (196 ms)

The script I am running is below. Please advise if the output in the results is normal. If it is normal, how can I see what the output of the script at each step. Thanks

a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader();b = FILTER a BY event != 'normal';
c = FOREACH b GENERATE driverid, event, (int)1 as occurance;
d = GROUP c BY driverid;
e = FOREACH d GENERATE group as driverid, sum(c.occurance) as t_occ;
g = LOAD 'driver_mileage' USING org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid,g by driverid;
dump h;
5 REPLIES 5

Explorer

@slachterman - I seen you answered a similar question before. Can you please help me out.

The "yarn jar" warning is nothing to worry about and the output you received suggests that you were unable to launch the script. My guess is your command-line interaction was incorrect. It should have been something like the following.

pig -useHCatalog yourscript.pig

You can see some examples of this at https://martin.atlassian.net/wiki/x/AgCfB (including running via Tez). If you are doing this, please show the exact command your ran. If running from the Ambari View, be sure to add the -useHCatalog argument as shown in Step 5.4 of https://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/.

Explorer

@Lester Martin I am running the script from Ambari and I have -useHCatalog argument added. But when I run the below script

a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader();
b = FILTER a BY event != 'normal'; 
dump b;

Instead of getting the output of the script, I get what you can see in the attached txt file. I want to know if this is normal and how can I see what the script is doing. thanks

results.txt

Gotcha; using the Ambari View. It still seems that it is not getting invoked properly. Can you provide a screenshot of the Ambari View; especially the section with the -useHCatalog argument? Did you try it with, and without, the "use Tez" checkbox selected? While this code looks good, it is often a good idea to try the code out from the CLI just to remove one less variable (again, the code looks simple and direct enough that I don't think this would provide much value other than showing you it can run).

Explorer

Thanks @Lester Martin. I removed -useHcatalog and readded it and now it seems to display the results of the script. It was very hard to learn without knowing the output of the script at each step.