Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hbase table data upload fails

avatar
Expert Contributor

using a pig script to upload data . Below is the yarn app log :

2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000
2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test
2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1
Total Length = 739
Input split[0]:
   Length = 739
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739
2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C:  R: 
2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException <For input string: "age"> field discarded
2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093
2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed
2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
	at java.util.ArrayList.rangeCheck(ArrayList.java:635)
	at java.util.ArrayList.get(ArrayList.java:411)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Jagdish Saripella

Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like

STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');

You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.

Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Jagdish Saripella

please post your pig script

avatar
Expert Contributor

@Artem Ervits

using one of the online examples . attached is the text data that is being uploaded

raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS ( custno:chararray, firstname:chararray, lastname:chararray, age:int, profession:chararray);

STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

'test_data:firstname

test_data:lastname

test_data:age

test_data:profession');

hbase tables:

hbase(main):002:0> describe 'test' DESCRIPTION ENABLED 'test', {NAME => 'test_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true EPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY = > 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1910 seconds

hbase-test.txt

avatar
Master Guru

Hi @Jagdish Saripella

Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like

STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');

You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.

Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.