Member since
11-06-2016
42
Posts
25
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7168 | 05-17-2017 01:38 PM | |
5891 | 02-07-2017 01:06 PM | |
4110 | 03-08-2016 07:25 PM |
02-22-2016
07:19 PM
1 Kudo
@Neeraj Sabharwal Sure .. was able to convince my team lead 🙂 to hold on the POC for now . Thanks for you response.
... View more
02-09-2016
06:58 PM
1 Kudo
Its Atlas 0.5.0 . This was the version that was shipped with HDP 2.3.0 . Can you paste the link to the demo here . Thanks, Jagdish Saripella
... View more
02-09-2016
06:44 PM
3 Kudos
I am looking to delete tags from existing Atlas instance . Apache or Hortonworks doesn't have a comprehensive documentation on how to delete tags using REST API (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_data_governance/content/ch_app_metadata_store_ref.html) . Can you please help.
... View more
Labels:
- Labels:
-
Apache Atlas
01-27-2016
04:56 AM
1 Kudo
@Artem Ervits using one of the online examples . attached is the text data that is being uploaded raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS (
custno:chararray,
firstname:chararray,
lastname:chararray,
age:int,
profession:chararray); STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'test_data:firstname test_data:lastname test_data:age test_data:profession'); hbase tables: hbase(main):002:0> describe 'test'
DESCRIPTION ENABLED
'test', {NAME => 'test_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true
EPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0'
, TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =
> 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1910 seconds hbase-test.txt
... View more
01-27-2016
04:06 AM
1 Kudo
using a pig script to upload data . Below is the yarn app log : 2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000
2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test
2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1
Total Length = 739
Input split[0]:
Length = 739
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
-----------------------
2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739
2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C: R:
2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException <For input string: "age"> field discarded
2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093
2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed
2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Pig
01-27-2016
04:01 AM
Thanks , i will try setting those MR properties through hive . Below is the MR framework counters screen shot .
... View more
01-26-2016
09:28 PM
1 Kudo
@Neeraj Sabharwal this best practice document is really helpful thank you .
... View more
01-26-2016
09:27 PM
1 Kudo
@Joseph Niemiec . The longest running mapper was for 2 mins 54 secs . The compression is set to false . I guess as suggested above, it makes sense to compress the data and send compress data across the network. Since this is a hive job , I believe below is the properties that need to be enabled . hive.exec.compress.intermediate=true I don't see an option to modify the compression format for the intermediate task in hive, looks like it picks up hadoop default compression . This has not been defined in our environment so i guess i will have to set this property and test it out. Is there a way to pass on the mapreduce intermediate compression from the job instead of making a global change .
... View more
01-26-2016
03:31 PM
1 Kudo
@Joseph Niemiec : Thanks for reply the current setting is as below : hive> set mapreduce.job.reduce.slowstart.completedmaps; mapreduce.job.reduce.slowstart.completedmaps=0.8 So i believe 80% of mappers should be completed before reducer kicks in ..
... View more
01-26-2016
02:37 PM
@Artem
Ervits : I guess i have provided incomplete information , my
apologizes. . This is a hive job which selects data from few tables
unions them and then insert overwrites into one other table ( i donot have info
for the amount of data the original tables, probably they were dropped ). The
job actually spuns 28 mappers 12 reducers , out of this 10 reducers have
completed the job under 3 mins expect for 2 which took approximately 2 hours . This
job is a cron and it has been running for quite few days , no config changes
were done from infrastructure end . One
other odd thing about this is that some days it spun reducers and other days it
doesn't . ( does this depend on the amount of data that is being parsed ? The
initial post yarn logs shows @ what point it took long and where it took long
time
2016-01-2500:10:27,135 INFO
[fetcher#22] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
XXXXXXXXXXXXXXXXXXXXXXXXXX:XXXX freed by fetcher#22 in 5ms
2016-01-2502:18:46,599 INFO
[fetcher#12] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput:
Read 296787239 bytes from map-output for
attempt_1450565638170_43987_m_000010_0
... View more
- « Previous
- Next »