Support Questions

mohan221213 · ‎09-11-2016

I have been facing this issue from long time.

I tried to solve this but i couldn't.

I need some experts advice to solve this.

I am trying to load a sample tweets json file.

sample.json;-

{"filter_level":"low","retweeted":false,"in_reply_to_screen_name":"FilmFan","truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":689085590822891521,"in_reply_to_user_id_str":"6048122","timestamp_ms":"1453125782100","in_reply_to_status_id":null,"created_at":"Mon Jan 18 14:03:02 +0000 2016","favorite_count":0,"place":null,"coordinates":null,"text":"@filmfan hey its time for you guys follow @acadgild To #AchieveMore and participate in contest Win Rs.500 worth vouchers","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}],"user_mentions":[{"id":6048122,"name":"Tanya","indices":[0,8],"screen_name":"FilmFan","id_str":"6048122"},{"id":2649945906,"name":"ACADGILD","indices":[42,51],"screen_name":"acadgild","id_str":"2649945906"}]},"is_quote_status":false,"source":"<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck<\/a>","favorited":false,"in_reply_to_user_id":6048122,"retweet_count":0,"id_str":"689085590822891521","user":{"location":"India ","default_profile":false,"profile_background_tile":false,"statuses_count":86548,"lang":"en","profile_link_color":"94D487","profile_banner_url":"https://pbs.twimg.com/profile_banners/197865769/1436198000","id":197865769,"following":null,"protected":false,"favourites_count":1002,"profile_text_color":"000000","verified":false,"description":"Proud Indian, Digital Marketing Consultant,Traveler, Foodie, Adventurer, Data Architect, Movie Lover, Namo Fan","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Bahubali","profile_background_color":"000000","created_at":"Sat Oct 02 17:41:02 +0000 2010","default_profile_image":false,"followers_count":4467,"profile_image_url_https":"https://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_offset":19800,"time_zone":"Chennai","notifications":null,"profile_use_background_image":false,"friends_count":810,"profile_sidebar_fill_color":"000000","screen_name":"Ashok_Uppuluri","id_str":"197865769","profile_image_url":"http://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","listed_count":50,"is_translator":false}}

I have tried to load this json file using ELEPHANT BIRD

script:-

REGISTER json-simple-1.1.1.jar 
REGISTER elephant-bird-2.2.3.jar 
REGISTER guava-11.0.2.jar 
REGISTER avro-1.7.7.jar
REGISTER piggybank-0.12.0.jar


twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();

B = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited;

describe B;

OUTPUT:-

B: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,entitis: map[chararray],favorited: boolean}

But when I tried to DUMP B the follwoing error has occured

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B

I am providing the complete logs here.

2016-09-11 14:07:57,184 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-09-11 14:07:57,184 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-09-11 14:07:57,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,194 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-09-11 14:07:57,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1473583077199-0 2016-09-11 14:07:57,206 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2016-09-11 14:07:57,207 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,208 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2016-09-11 14:07:57,211 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-09-11 14:07:57,211 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2016-09-11 14:07:57,212 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2016-09-11 14:07:57,216 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local360376249_0009 2016-09-11 14:07:57,267 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2016-09-11 14:07:57,267 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2016-09-11 14:07:57,271 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2016-09-11 14:07:57,272 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local360376249_0009_m_000000_0 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2016-09-11 14:07:57,278 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 2416 Input split[0]: Length = 2416 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/root/PIG/PIG/sample.json:0+2416 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,288 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-09-11 14:07:57,290 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2016-09-11 14:07:57,296 [Thread-214] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local360376249_0009 java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55) at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70) at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:130) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local360376249_0009 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases B,twitter 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2016-09-11 14:07:57,468 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local360376249_0009 has failed! Stop running all dependent jobs 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,469 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2016-09-11 14:07:57,470 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN Failed! Failed Jobs: JobIdAliasFeatureMessageOutputs job_local360376249_0009B,twitterMAP_ONLYMessage: Job failed!file:/tmp/temp252944192/tmp-470484503, Input(s): Failed to read data from "file:///root/PIG/PIG/sample.json" Output(s): Failed to produce result in "file:/tmp/temp252944192/tmp-470484503" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local360376249_0009

And please give a clarification on how to use jar files.

And what are the versions to use.

There is soo much of confusion for me.

Someone says use Elephant Bird, and Someone says use AVRO.

Please help.

Mohan.V

gkeys · ‎09-11-2016

@Mohan V

Issue of jar version imcompatibility.

You need to use the following newer versions of elephant bird (and not the older version)

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar

I tested it with your code and sample and it works.

You can get the jars at:

http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-core-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-pig-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-hadoop-compat-4.1.jar.zip

Regarding DESCRIBE working but DUMP causing the issue: DUMP runs the map-reduce program and DESCRIBE does not.

View solution in original post

gkeys · ‎09-11-2016

@Mohan V

Issue of jar version imcompatibility.

You need to use the following newer versions of elephant bird (and not the older version)

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar

I tested it with your code and sample and it works.

You can get the jars at:

http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-core-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-pig-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-hadoop-compat-4.1.jar.zip

Regarding DESCRIBE working but DUMP causing the issue: DUMP runs the map-reduce program and DESCRIBE does not.

mohan221213 · ‎09-12-2016

thank you gkeys...

You are....the best...

Cloudera Community

Support Questions

ERROR 1066: Unable to open iterator for alias- PIG