Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3663 | 05-03-2017 05:13 PM | |
| 3018 | 05-02-2017 08:38 AM | |
| 3280 | 05-02-2017 08:13 AM | |
| 3223 | 04-10-2017 10:51 PM | |
| 1690 | 03-28-2017 02:27 AM |
02-05-2016
11:38 AM
I think it crashed on me when I dumped the whole dataset, there might be a problem with your dataset further down. @John Smith
... View more
02-05-2016
11:14 AM
@nejm hadj nifi is a visual tool, flume is not. You can work with nifi and flume together as nifi has flume processors. Once you try nifi you won't need to look at flume.
... View more
02-05-2016
05:23 AM
@bikas question is closed are you having same problems? Open a new thread in that case.
... View more
02-05-2016
02:49 AM
@John Smith something is wrong with your environment, I was able to execute your statements in mapred and tez modes grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:43:32,930 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:43:33,105 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,106 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:43:33,209 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:43:33,256 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,257 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:43:33,305 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp2063345867/tmp1865027526/_temporary/0/task__0001_m_000001
2016-02-05 02:43:33,333 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,336 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,336 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)
and output of illustrate in mapred mode ("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673)
2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,394 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,401 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,452 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R:
2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,473 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,473 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,520 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R:
2016-02-05 02:43:47,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,545 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,589 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R:
2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,613 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,665 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| sourceData | nullname:chararray | customerId:chararray | VIN:chararray | Birthdate:chararray | Mileage:chararray | Fuel_Consumption:chararray |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | "Leonel Bullen" | 50258523 | "WBA23530058599244" | "1986-08-26" | 27 | 8.673 |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
and in tez mode grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:46:17,619 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:46:17,698 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:17,749 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:46:18,143 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:18,274 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,288 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:46:18,711 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp-785652652/tmp136925164/_temporary/0/task__0001_m_000001
2016-02-05 02:46:18,782 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:46:18,811 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)
there was a bug where illustrate command doesn't work in tez mode yet. bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.
... View more
02-05-2016
01:39 AM
@keerthana gajarajakumar sure thing, let me know how it works out.
... View more
02-05-2016
01:07 AM
@John Smith really great job, I was not aware of 2nd AvroStorage package. @Predrag Minovic @Benjamin Leonhardi you might find this of use?
... View more
02-05-2016
01:00 AM
Awesome find @John Smith re-accepted to give you credit
... View more
02-05-2016
12:43 AM
If you use LocalCluster you can execute in IntelliJ or any other IDE. Just make sure in your pom you set scope compile. When you are ready to run on Sandbox, you can scp the jar to sandbox and run the job by specifying storm jar command. Here's more info Link. Just make sure when you compile for submission to real cluster and not LocalCluster, set scope back to provided for Storm dependency. The documentation in the link I gave you is comprehensive. Read through it, it will save you a lot of time. @keerthana gajarajakumar
... View more
02-05-2016
12:35 AM
@Raja Sekhar Chintalapati did that solve your issue?
... View more