Created 01-27-2016 04:06 AM
using a pig script to upload data . Below is the yarn app log :
2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000 2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test 2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1 Total Length = 739 Input split[0]: Length = 739 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739 2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code. 2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C: R: 2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException <For input string: "age"> field discarded 2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093 2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed 2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Created 01-27-2016 06:51 AM
Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like
STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');
You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.
Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.
Created 01-27-2016 04:19 AM
please post your pig script
Created 01-27-2016 04:56 AM
using one of the online examples . attached is the text data that is being uploaded
raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS ( custno:chararray, firstname:chararray, lastname:chararray, age:int, profession:chararray);
STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'test_data:firstname
test_data:lastname
test_data:age
test_data:profession');
hbase tables:
hbase(main):002:0> describe 'test' DESCRIPTION ENABLED 'test', {NAME => 'test_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true EPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY = > 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1910 seconds
Created 01-27-2016 06:51 AM
Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like
STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');
You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.
Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.