About aervits

aervits · ‎02-05-2016

I think it crashed on me when I dumped the whole dataset, there might be a problem with your dataset further down. @John Smith

aervits · ‎02-05-2016

@nejm hadj nifi is a visual tool, flume is not. You can work with nifi and flume together as nifi has flume processors. Once you try nifi you won't need to look at flume.

aervits · ‎02-05-2016

@bikas question is closed are you having same problems? Open a new thread in that case.

aervits · ‎02-05-2016

@John Smith something is wrong with your environment, I was able to execute your statements in mapred and tez modes grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray); grunt> describe sourceData; sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray} grunt> b = limit sourceData 5; grunt> dump b; 2016-02-05 02:43:32,930 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT 2016-02-05 02:43:33,105 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:33,106 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-02-05 02:43:33,209 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:43:33,256 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:43:33,257 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-02-05 02:43:33,305 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp2063345867/tmp1865027526/_temporary/0/task__0001_m_000001 2016-02-05 02:43:33,333 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:33,336 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:43:33,336 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 ("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26) ("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551) ("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779) ("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109) ("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081) and output of illustrate in mapred mode ("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673) 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,394 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,401 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,452 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,473 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,473 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,520 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,545 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,589 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,613 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,665 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | sourceData | nullname:chararray | customerId:chararray | VIN:chararray | Birthdate:chararray | Mileage:chararray | Fuel_Consumption:chararray | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | "Leonel Bullen" | 50258523 | "WBA23530058599244" | "1986-08-26" | 27 | 8.673 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- and in tez mode grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray); grunt> describe sourceData; sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray} grunt> b = limit sourceData 5; grunt> dump b; 2016-02-05 02:46:17,619 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT 2016-02-05 02:46:17,698 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:46:17,749 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-02-05 02:46:18,143 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:46:18,274 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:46:18,288 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-02-05 02:46:18,711 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp-785652652/tmp136925164/_temporary/0/task__0001_m_000001 2016-02-05 02:46:18,782 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:46:18,811 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:46:18,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 ("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26) ("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551) ("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779) ("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109) ("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081) there was a bug where illustrate command doesn't work in tez mode yet. bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.

aervits · ‎02-05-2016

@keerthana gajarajakumar sure thing, let me know how it works out.

aervits · ‎02-05-2016

@John Smith really great job, I was not aware of 2nd AvroStorage package. @Predrag Minovic @Benjamin Leonhardi you might find this of use?

aervits · ‎02-05-2016

Awesome find @John Smith re-accepted to give you credit

aervits · ‎02-05-2016

@John Smith upload your dataset I will try it out in a bit.

aervits · ‎02-05-2016

If you use LocalCluster you can execute in IntelliJ or any other IDE. Just make sure in your pom you set scope compile. When you are ready to run on Sandbox, you can scp the jar to sandbox and run the job by specifying storm jar command. Here's more info Link. Just make sure when you compile for submission to real cluster and not LocalCluster, set scope back to provided for Storm dependency. The documentation in the link I gave you is comprehensive. Read through it, it will save you a lot of time. @keerthana gajarajakumar

aervits · ‎02-05-2016

@Raja Sekhar Chintalapati did that solve your issue?

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: PigStorage in mapreduce mode

Re: i'am trying to develop my first project with h...

Re: Spark in YARN with Namenode HA

Re: PigStorage in mapreduce mode

Re: How can I create a simple topology? I want to ...

Re: AvroStorage with mapreduce and java.lang.Runti...

Re: AvroStorage with mapreduce and java.lang.Runti...

Re: PigStorage in mapreduce mode

Re: How can I create a simple topology? I want to ...

Re: org.apache.ambari.server.AmbariException: Cann...