<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: hbase table data upload fails in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113249#M16498</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2364/jagdishsaripella.html" nodeid="2364"&gt;@Jagdish Saripella&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like&lt;/P&gt;&lt;PRE&gt;STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');&lt;/PRE&gt;&lt;P&gt;You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.&lt;/P&gt;&lt;P&gt;Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.&lt;/P&gt;</description>
    <pubDate>Wed, 27 Jan 2016 14:51:34 GMT</pubDate>
    <dc:creator>pminovic</dc:creator>
    <dc:date>2016-01-27T14:51:34Z</dc:date>
    <item>
      <title>hbase table data upload fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113246#M16495</link>
      <description>&lt;P&gt;using a pig script to upload data . Below is the yarn app log :&lt;/P&gt;&lt;PRE&gt;2016-01-26 10:57:59,797 INFO [main-SendThread(am2rlccmrhdn04.r1-core.r1.aig.net:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server am2rlccmrhdn04.r1-core.r1.aig.net/10.175.68.14:2181, sessionid = 0x251c236ef7b0093, negotiated timeout = 30000
2016-01-26 10:57:59,924 INFO [main] org.apache.hadoop.hbase.mapreduce.TableOutputFormat: Created table instance for test
2016-01-26 10:57:59,951 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2016-01-26 10:58:00,413 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Number of splits :1
Total Length = 739
Input split[0]:
   Length = 739
   ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
   Locations:

-----------------------

2016-01-26 10:58:00,443 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://dr-gfat/user/u1448739/hbase_text.txt:0+739
2016-01-26 10:58:00,570 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
2016-01-26 10:58:00,657 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: raw_data[1,11],raw_data[-1,-1] C:  R: 
2016-01-26 10:58:00,713 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.Utf8StorageConverter(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to interpret value [32, 97, 103, 101] in field being converted to int, caught NumberFormatException &amp;lt;For input string: "age"&amp;gt; field discarded
2016-01-26 10:58:00,730 INFO [main] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251c236ef7b0093
2016-01-26 10:58:00,733 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x251c236ef7b0093 closed
2016-01-26 10:58:00,733 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
2016-01-26 10:58:00,735 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
	at java.util.ArrayList.rangeCheck(ArrayList.java:635)
	at java.util.ArrayList.get(ArrayList.java:411)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:947)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-01-26 10:58:00,747 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task&lt;/PRE&gt;</description>
      <pubDate>Wed, 27 Jan 2016 12:06:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113246#M16495</guid>
      <dc:creator>jagdish</dc:creator>
      <dc:date>2016-01-27T12:06:28Z</dc:date>
    </item>
    <item>
      <title>Re: hbase table data upload fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113247#M16496</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2364/jagdishsaripella.html" nodeid="2364"&gt;@Jagdish Saripella&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;please post your pig script&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 12:19:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113247#M16496</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-01-27T12:19:58Z</dc:date>
    </item>
    <item>
      <title>Re: hbase table data upload fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113248#M16497</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt; &lt;/P&gt;&lt;P&gt;using one of the online examples . attached is the text data that is being uploaded &lt;/P&gt;&lt;P&gt;raw_data = LOAD '/user/u1448739/hbase_text.txt' USING PigStorage(',') AS (
 custno:chararray,
 firstname:chararray,
 lastname:chararray,
 age:int,
 profession:chararray); &lt;/P&gt;&lt;P&gt; STORE raw_data INTO 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( &lt;/P&gt;&lt;P&gt;'test_data:firstname &lt;/P&gt;&lt;P&gt; test_data:lastname &lt;/P&gt;&lt;P&gt; test_data:age &lt;/P&gt;&lt;P&gt; test_data:profession');&lt;/P&gt;&lt;P&gt;hbase tables:&lt;/P&gt;&lt;P&gt;hbase(main):002:0&amp;gt; describe 'test'
DESCRIPTION                                                                           ENABLED
 'test', {NAME =&amp;gt; 'test_data', DATA_BLOCK_ENCODING =&amp;gt; 'NONE', BLOOMFILTER =&amp;gt; 'ROW', R true
 EPLICATION_SCOPE =&amp;gt; '0', VERSIONS =&amp;gt; '1', COMPRESSION =&amp;gt; 'NONE', MIN_VERSIONS =&amp;gt; '0'
 , TTL =&amp;gt; 'FOREVER', KEEP_DELETED_CELLS =&amp;gt; 'false', BLOCKSIZE =&amp;gt; '65536', IN_MEMORY =
 &amp;gt; 'false', BLOCKCACHE =&amp;gt; 'true'}
1 row(s) in 0.1910 seconds&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/1576-hbase-test.txt"&gt;hbase-test.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 12:56:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113248#M16497</guid>
      <dc:creator>jagdish</dc:creator>
      <dc:date>2016-01-27T12:56:56Z</dc:date>
    </item>
    <item>
      <title>Re: hbase table data upload fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113249#M16498</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2364/jagdishsaripella.html" nodeid="2364"&gt;@Jagdish Saripella&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like&lt;/P&gt;&lt;PRE&gt;STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession');&lt;/PRE&gt;&lt;P&gt;You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want.&lt;/P&gt;&lt;P&gt;Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 14:51:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/hbase-table-data-upload-fails/m-p/113249#M16498</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-01-27T14:51:34Z</dc:date>
    </item>
  </channel>
</rss>

