About mohan221213

mohan221213 · ‎09-12-2016

thank you gkeys... You are....the best...

mohan221213 · ‎09-11-2016

I would like to know that, How can we consume kafka topic messages using PIG? What are the jar files it requires? Any suggestions. Mohan.V

mohan221213 · ‎09-11-2016

I have been facing this issue from long time. I tried to solve this but i couldn't. I need some experts advice to solve this. I am trying to load a sample tweets json file. sample.json;- {"filter_level":"low","retweeted":false,"in_reply_to_screen_name":"FilmFan","truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":689085590822891521,"in_reply_to_user_id_str":"6048122","timestamp_ms":"1453125782100","in_reply_to_status_id":null,"created_at":"Mon Jan 18 14:03:02 +0000 2016","favorite_count":0,"place":null,"coordinates":null,"text":"@filmfan hey its time for you guys follow @acadgild To #AchieveMore and participate in contest Win Rs.500 worth vouchers","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}],"user_mentions":[{"id":6048122,"name":"Tanya","indices":[0,8],"screen_name":"FilmFan","id_str":"6048122"},{"id":2649945906,"name":"ACADGILD","indices":[42,51],"screen_name":"acadgild","id_str":"2649945906"}]},"is_quote_status":false,"source":"<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck<\/a>","favorited":false,"in_reply_to_user_id":6048122,"retweet_count":0,"id_str":"689085590822891521","user":{"location":"India ","default_profile":false,"profile_background_tile":false,"statuses_count":86548,"lang":"en","profile_link_color":"94D487","profile_banner_url":"https://pbs.twimg.com/profile_banners/197865769/1436198000","id":197865769,"following":null,"protected":false,"favourites_count":1002,"profile_text_color":"000000","verified":false,"description":"Proud Indian, Digital Marketing Consultant,Traveler, Foodie, Adventurer, Data Architect, Movie Lover, Namo Fan","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Bahubali","profile_background_color":"000000","created_at":"Sat Oct 02 17:41:02 +0000 2010","default_profile_image":false,"followers_count":4467,"profile_image_url_https":"https://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_offset":19800,"time_zone":"Chennai","notifications":null,"profile_use_background_image":false,"friends_count":810,"profile_sidebar_fill_color":"000000","screen_name":"Ashok_Uppuluri","id_str":"197865769","profile_image_url":"http://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","listed_count":50,"is_translator":false}} I have tried to load this json file using ELEPHANT BIRD script:- REGISTER json-simple-1.1.1.jar REGISTER elephant-bird-2.2.3.jar REGISTER guava-11.0.2.jar REGISTER avro-1.7.7.jar REGISTER piggybank-0.12.0.jar twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader(); B = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited; describe B; OUTPUT:- B: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,entitis: map[chararray],favorited: boolean} But when I tried to DUMP B the follwoing error has occured ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B I am providing the complete logs here. 2016-09-11 14:07:57,184 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-09-11 14:07:57,184 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-09-11 14:07:57,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,194 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-09-11 14:07:57,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1473583077199-0 2016-09-11 14:07:57,206 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2016-09-11 14:07:57,207 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,208 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2016-09-11 14:07:57,211 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-09-11 14:07:57,211 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2016-09-11 14:07:57,212 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2016-09-11 14:07:57,216 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local360376249_0009 2016-09-11 14:07:57,267 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2016-09-11 14:07:57,267 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2016-09-11 14:07:57,271 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2016-09-11 14:07:57,272 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local360376249_0009_m_000000_0 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2016-09-11 14:07:57,278 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 2416 Input split[0]: Length = 2416 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/root/PIG/PIG/sample.json:0+2416 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-09-11 14:07:57,288 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-09-11 14:07:57,290 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2016-09-11 14:07:57,296 [Thread-214] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local360376249_0009 java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55) at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70) at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:130) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local360376249_0009 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases B,twitter 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2016-09-11 14:07:57,468 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local360376249_0009 has failed! Stop running all dependent jobs 2016-09-11 14:07:57,468 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2016-09-11 14:07:57,469 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2016-09-11 14:07:57,470 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN Failed! Failed Jobs: JobIdAliasFeatureMessageOutputs job_local360376249_0009B,twitterMAP_ONLYMessage: Job failed!file:/tmp/temp252944192/tmp-470484503, Input(s): Failed to read data from "file:///root/PIG/PIG/sample.json" Output(s): Failed to produce result in "file:/tmp/temp252944192/tmp-470484503" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local360376249_0009 And please give a clarification on how to use jar files. And what are the versions to use. There is soo much of confusion for me. Someone says use Elephant Bird, and Someone says use AVRO. Please help. Mohan.V

mohan221213 · ‎09-11-2016

I think i got it on my own. As gkeys said, i made it too complex. But at last I have realized that I don't need the 3rd step which is grouping, and it is successfully stored into the hbase. Here is the Script:- data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern =='google_*'; STORE A into 'hbase://tablename' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tweets:tweets');

mohan221213 · ‎09-10-2016

Thank you for your valuable suggestions gkeys. I didnt expected that it will beome a complex script like this. As i said that I am just a beginner in Pig. So please suggest me solve for the same.

mohan221213 · ‎09-10-2016

I am new to pig.trying filter the text file and store it in hbase here is the sample input file sample.txt {"pattern":"google_1473491793_265244074740","tweets":[{"tweet::created_at":"18:47:31 ","tweet::id":"252479809098223616","tweet::user_id":"450990391","tweet::text":"rt @joey7barton: ..give a google about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ..."}]} {"pattern":"facebook_1473491793_265244074740","tweets":[{"tweet::created_at":"11:33:16 ","tweet::id":"252370526411051008","tweet::user_id":"845912316","tweet::text":"@maarionymcmb facebook mere ta dit tu va resté chez toi dnc tu restes !"}]} Script:- data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {t1:tuple(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern == 'google_*'; grouped = foreach (group A by pattern){tweets1 = foreach data generate tweets.(created_at),tweets.(id),tweets.(user_id),tweets.(text); generate group as pattern1,tweets1;} But i got the error when run grouped 2016-09-10 13:38:52,995 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 41, column 57> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null) Please correct me what i am doung wrong. thank you

mohan221213 · ‎09-09-2016

I think i got it on my own. Actually I have forgotten the credentials and entered the wrong password. But at last its done by entering right credentials.

mohan221213 · ‎09-09-2016

Thanks for your reply jk. as you suggested i have tried to disable the kerberos. But I got struck at 1st step only. Ie. admin session expiration error. asking for admin principal and admin password. when i entered the credentials it is giving the error. So please suggest me to solve this. I am attaching the screenshot, please look into it.

mohan221213 · ‎09-09-2016

Hello everyone. I would like to disable kerberos from my cluster completely. For that I need experts guidance. What are the steps that i need to follow without facing any issues. Any suggestions.

mohan221213 · ‎09-09-2016

Thanks for your reply Predrag Minovic. I have tried by using Elephant Bird JsonLoader. script: REGISTER piggybank.jar REGISTER json-simple-1.1.1.jar REGISTER elephant-bird-pig-4.3.jar REGISTER elephant-bird-core-4.1.jar REGISTER elephant-bird-hadoop-compat-4.3.jar json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray'); describe json Schema for json unknown. Please suggest me.

Online	Offline
Last Visited	‎03-15-2019 09:32 AM

Member Since	‎06-03-2016 01:08 PM
Last Visited	‎03-15-2019 09:32 AM
Posts	66
Kudos received	21

Cloudera Community

Re: Ambari server start giving Error java process ...

Re: PIg JsonLoader (UDF_WARNING_1): Bad map field,...

Re: Unable to read json file using elephant-bird,p...

Re: PIG script Error

Re: Disable Kerberos From Ambari Completely

Re: ERROR 1066: Unable to open iterator for alias-...

How to Consume KAFKA messages using PIG ?

ERROR 1066: Unable to open iterator for alias- PIG

Re: PIG script Error

Re: PIG script Error

PIG script Error

Re: Disable Kerberos From Ambari Completely

Re: Disable Kerberos From Ambari Completely

Disable Kerberos From Ambari Completely

Re: PIg JsonLoader (UDF_WARNING_1): Bad map field,...