Created 08-03-2018 07:34 AM
I found the following article about how to fill a HBase table with data from Hive: https://community.hortonworks.com/articles/2745/creating-hbase-hfiles-from-an-existing-hive-table.ht...
I also did the steps for me, and it seems to work. The problem is, when I call the following HiveQL
set hfile.family.path=/tmp/test_hbase/cf set hive.hbase.generatehfiles=true INSERT OVERWRITE TABLE testdb.test_hbase SELECT distinct concat_ws("_", name, number, test, step, cast(starttime as STRING)) as k, hashValue, valuelist from testdb.test_orc order by k, hashvalue limit 1000
I need to combine 4 columns to get a unique row key for my HBase table. Another problem is, that my valueList column can contain huge Strings, between 0 and 1 MB.
When I run the query, Tez creates 100 containers for Mapping jobs. This takes a few minutes to complete, which is also slow for 1000 rows, but ok. After the Map step, a Reduce step follows. And this could be the problem in my oppinion, because there's only 1 Reducer for this huge amount of data. This seems to be too less, as the job takes hours now (still not completed yet!)
My questions here:
Thank you!
Created 08-03-2018 09:47 AM
Found following article: http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html
That helped me. I now set the numbers of Tasks to a fix amount.
Created 08-03-2018 07:55 AM
Last night I was in the search of one of the best portal to which let me play fireboy and watergirl free online games so here it is before you which will also help you to sharp your mind along with great entertainment.
Created 08-03-2018 09:47 AM
Found following article: http://www.openkb.info/2017/05/hive-on-tez-how-to-control-number-of.html
That helped me. I now set the numbers of Tasks to a fix amount.
Created 08-03-2018 02:36 PM
When you are generating HFiles for HBase, the typical pattern is that you have one reducer per Region because HFiles must only contain data for a specific Region. As such, tweaking the number Reducers you get is more of a factor of presplitting your table to increase the number of Reducers (or merging, to reduce the number of Reducers).
Created 08-06-2018 12:40 PM
@Josh Elser Thank you for that information. I changed my HBase table creation now to following command:
create 'hbase_1m_10r', {NAME => 'cf'}, {SPLITS => ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']}<br>
When running the following query:
INSERT OVERWRITE TABLE dmueller.hbase_1m_10r SELECT concat_ws(":", cast((hashvalue % 10) as String), concat_ws("_", name, number, test, step, cast(starttime as STRING))) as k, valuelist from (select * from testdb.test_orc limit 1000000) a distribute by split(k, ":")[0] sort by k<br>
I still have only 1 reducer... Any idea why?
Created 08-06-2018 03:42 PM
Does your data actually span all of the regions you created splitpoints for? Or, when this finishes generating the HFile, does the client end up having to split the HFiles (and not just load them?).
The only thing I can guess would be that the HBaseStorageHandler isn't doing something right. Generating only on HFile when you have 10 regions is definitely suboptimal.