Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14021 | 11-14-2017 01:11 AM | |
60486 | 11-03-2017 06:53 AM | |
4292 | 11-03-2017 06:18 AM | |
13492 | 09-12-2017 05:51 AM | |
1981 | 09-08-2017 02:50 AM |
08-16-2016
02:03 AM
The "multiplier" is not a parameter. You directly set the number of vcpu you want yarn to use. I guess you have to do the math yourself before setting the value.
... View more
08-11-2016
06:36 AM
Ok, since the default behaviour is unefficient I have search for a way to make the "bulk load" more efficient. I think I found a more efficient way, but there seems to be a blocker bug on that (referenced here : https://issues.apache.org/jira/browse/HIVE-13539 ) 1- The point is to set these two properties before runing the insert command : SET hive.hbase.generatehfiles=true; SET hfile.family.path=/<a_path>/<thecolumn_family_name>; 2- Then run the insert query which will prepare HFile at the designated location (instead of directly loading the HBase table). 3- And then only, performe a bulkload on HBase using the HFiles prepared. export HADOOP_CLASSPATH=`hbase classpath` yarn jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /<a_path>/<thecolumn_family_name> Problem, the query creating the HFile is failing because it "found" multiple column family because it look at the wrong folder. I'm doing my test on CDH5.7.1 Does someone already test this method ? If yes, is there some properties to set I have forgotten ? Or is this really a blocker issue ? Then I'll raise this to the support. regards, mathieu
... View more
08-09-2016
05:53 AM
Did you specify the POST parameter "execute" with the query ? https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive#WebHCatReferenceHive-URL
... View more
08-09-2016
01:00 AM
In the role HDFS there is a "NFS gateway service" that let you mount an NFS image of the HDFS. That is one way (you can directly copy file to it). (Check the performance). Hue (web ui) also let you upload files into HDFS (this is a more manual approach). In our enterprise, for an automated process, we are using a custom Java application that is using the HCatWriter API for writting into Hive tables. But you can also use the httpFs or the webHdfs.
... View more
08-09-2016
12:23 AM
Thank you for this explanation. This will help me a lot for the next steps.
... View more
08-08-2016
07:59 AM
Hi, We are facing some performance issue while loading data into HBase (using Hive queries). The Hive query is quite simple : INSERT INTO TABLE <hive_table_name_targeting_hbase_table> SELECT * FROM <hive_table> The table "<hive_table_name_targeting_hbase_table>" is an Hive table using the HBaseStorageHandler (So there is a Hbase table as the storage). The table "<hive_table>" is a regular Hive table. There is millions of lines in the <hive_table> and the <hive_table_name_targeting_hbase_table> is empty. When running the query we can see that the Yarn job generate "177 mapper" (less or more depending on the data size in <hive_table>). This part is quite "normal". But when I check the execution log of each mapper, I can see that some mapper take A LOT MORE TIME than others. Some mapper can take up to an hour (whereas the normal time of a mapper is around 10 minutes). In the log file of the "slow" mappers I can see a lot of retry on HBase operation (and finaly some exception about NotServingHBaseRegion. After some time (and a lot of retry) it's OK. But unfortunatly, this is slowing down the treatment a lot. Does someone has already encounter this ? (while loading a HBase table using Hive queries) ? Could it be related to region being split during the write ? If yes, why ? Is there some bug in the HBaseStorageHandler with too much data ? Of course the HBase table is online and can accessed normaly after loading the data. So no HBase configuration issue here (at least not a basic one). HBase compaction is set to 0 (and is launched manualy). Log sample : 2016-08-08 10:18:25,962 INFO [htable-pool1-t31] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=prd_piste_audit_gsie_traite_001, attempt=13/35 failed=28ops, last exception: null on <a_host>,60020,1467474218569, tracking started null, retrying after=20126ms, replay=28ops 2016-08-08 10:18:46,091 INFO [htable-pool1-t31] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=prd_piste_audit_gsie_traite_001, attempt=14/35 failed=28ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region prd_piste_audit_gsie_traite_001,15a55dd4-5c6e-41b3-9d2e-304015aae5e9,1470642880612.e8868eaa5ac33c4612632c2c89474ecc. is not online on <a_host>,60020,1467474218569 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2786) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:922) at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1893) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) on <a_host>,60020,1467474218569, tracking started null, retrying after=20099ms, replay=28ops
... View more
Labels:
- Labels:
-
Apache HBase
05-19-2016
12:41 AM
I'm not seeing the same issue here. Check the yarn application logs. It will surely contain information about the issue.
... View more
04-27-2016
07:53 AM
Not sure this is the latest documentation for Impala but the Hive "Date" type is not supported in Impala. Use TIMESTAMP instead for example. Check "Impala supported data types" in google. (Sorry I can't paste the url I don't know why).
... View more
03-03-2016
02:28 AM
Why not creating an Hive table on top of the text file and then simply use an hive query to load the data into the avro table ?
... View more
12-18-2015
02:29 AM
That's great to know. best regards.
... View more
- « Previous
- Next »