Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

hbase table data upload fails

avatar
Expert Contributor

using a pig script to upload data . Below is the yarn app log :

2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000
2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test
2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1
Total Length = 739
Input split[0]:
   Length = 739
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739
2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C:  R: 
2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException <For input string: "age"> field discarded
2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093
2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed
2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
	at java.util.ArrayList.rangeCheck(ArrayList.java:635)
	at java.util.ArrayList.get(ArrayList.java:411)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Jagdish Saripella

Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like

STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');

You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.

Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Jagdish Saripella

please post your pig script

avatar
Expert Contributor

@Artem Ervits

using one of the online examples . attached is the text data that is being uploaded

raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS ( custno:chararray, firstname:chararray, lastname:chararray, age:int, profession:chararray);

STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

'test_data:firstname

test_data:lastname

test_data:age

test_data:profession');

hbase tables:

hbase(main):002:0> describe 'test' DESCRIPTION ENABLED 'test', {NAME => 'test_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true EPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY = > 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1910 seconds

hbase-test.txt

avatar
Master Guru

Hi @Jagdish Saripella

Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like

STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');

You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.

Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.