About jagdish

jagdish · ‎02-22-2016

@Neeraj Sabharwal Sure .. was able to convince my team lead 🙂 to hold on the POC for now . Thanks for you response.

jagdish · ‎02-09-2016

Its Atlas 0.5.0 . This was the version that was shipped with HDP 2.3.0 . Can you paste the link to the demo here . Thanks, Jagdish Saripella

jagdish · ‎02-09-2016

I am looking to delete tags from existing Atlas instance . Apache or Hortonworks doesn't have a comprehensive documentation on how to delete tags using REST API (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_data_governance/content/ch_app_metadata_store_ref.html) . Can you please help.

jagdish · ‎01-27-2016

@Artem Ervits using one of the online examples . attached is the text data that is being uploaded raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS ( custno:chararray, firstname:chararray, lastname:chararray, age:int, profession:chararray); STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'test_data:firstname test_data:lastname test_data:age test_data:profession'); hbase tables: hbase(main):002:0> describe 'test' DESCRIPTION ENABLED 'test', {NAME => 'test_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true EPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY = > 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1910 seconds hbase-test.txt

jagdish · ‎01-27-2016

using a pig script to upload data . Below is the yarn app log : 2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000 2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test 2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1 Total Length = 739 Input split[0]: Length = 739 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739 2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code. 2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C: R: 2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException <For input string: "age"> field discarded 2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093 2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed 2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task

jagdish · ‎01-27-2016

Thanks , i will try setting those MR properties through hive . Below is the MR framework counters screen shot .

jagdish · ‎01-26-2016

@Neeraj Sabharwal this best practice document is really helpful thank you .

jagdish · ‎01-26-2016

@Joseph Niemiec . The longest running mapper was for 2 mins 54 secs . The compression is set to false . I guess as suggested above, it makes sense to compress the data and send compress data across the network. Since this is a hive job , I believe below is the properties that need to be enabled . hive.exec.compress.intermediate=true I don't see an option to modify the compression format for the intermediate task in hive, looks like it picks up hadoop default compression . This has not been defined in our environment so i guess i will have to set this property and test it out. Is there a way to pass on the mapreduce intermediate compression from the job instead of making a global change .

jagdish · ‎01-26-2016

@Joseph Niemiec : Thanks for reply the current setting is as below : hive> set mapreduce.job.reduce.slowstart.completedmaps; mapreduce.job.reduce.slowstart.completedmaps=0.8 So i believe 80% of mappers should be completed before reducer kicks in ..

jagdish · ‎01-26-2016

@Artem Ervits : I guess i have provided incomplete information , my apologizes. . This is a hive job which selects data from few tables unions them and then insert overwrites into one other table ( i donot have info for the amount of data the original tables, probably they were dropped ). The job actually spuns 28 mappers 12 reducers , out of this 10 reducers have completed the job under 3 mins expect for 2 which took approximately 2 hours . This job is a cron and it has been running for quite few days , no config changes were done from infrastructure end . One other odd thing about this is that some days it spun reducers and other days it doesn't . ( does this depend on the amount of data that is being parsed ? The initial post yarn logs shows @ what point it took long and where it took long time 2016-01-2500:10:27,135 INFO [fetcher#22] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: XXXXXXXXXXXXXXXXXXXXXXXXXX:XXXX freed by fetcher#22 in 5ms 2016-01-2502:18:46,599 INFO [fetcher#12] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 296787239 bytes from map-output for attempt_1450565638170_43987_m_000010_0

Online	Offline
Last Visited	‎07-11-2017 09:50 AM

Member Since	‎11-06-2016 12:01 PM
Last Visited	‎07-11-2017 09:50 AM
Posts	42
Kudos received	25

Cloudera Community

Re: Impala service does not start after enabling k...

Re: Permission denied: user=falcon, access=EXECUTE...

Re: Same version of Ambari version on diferent ser...

Re: How to remove tags from Atlas

Re: How to remove tags from Atlas

How to remove tags from Atlas

Re: hbase table data upload fails

hbase table data upload fails

Re: Need to understand why Job taking long time in...

Re: Need to understand why Job taking long time in...

Re: Need to understand why Job taking long time in...

Re: Need to understand why Job taking long time in...

Re: Need to understand why Job taking long time in...