Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9547 | 11-14-2017 01:11 AM | |
54681 | 11-03-2017 06:53 AM | |
3557 | 11-03-2017 06:18 AM | |
11722 | 09-12-2017 05:51 AM | |
1379 | 09-08-2017 02:50 AM |
01-19-2017
03:23 AM
I don't think Impala has such a feature (but I could be wrong). If I were you, I would try to answer these questions : - "Why do I need this kind of ouput ?" - "What do I use it for ?" - "Can't I acheive my goal with an other output ?" Maybe you will find an other approach more adapted. By the way, I guess something like this would be better (but it will not make a huge difference) : SELECT a AS col FROM tmp
UNION ALL SELECT b AS col FROM tmp
UNION ALL SELECT c AS col FROM tmp
UNION ALL SELECT d AS col FROM tmp
... View more
01-17-2017
08:27 AM
Hive is not Oracle. You should not expect the same processing capabilities. Hive is designed to run long and heavy queries. Whereas it performed poorly for small queries like the one you try to optimize. Also note that Hive run on top of Yarn and per design Yarn takes time to instanciate containers and the JVM inside these containers. This should correspond to your question "why it takes so much time to start the Query Job". If you want to get a quick reply for a basic count(*) without any filter/condition you might want to read about Hive statistics.
... View more
01-10-2017
07:58 AM
What kind of web based interface do you need from HiveServer2 ? If it is a user interface for querying Hive then HiveServer2 do not provide one OOTB. But Know that Hue is using HiveServer2 when you are submitting hive queries inside the "Hive Editor".
... View more
12-19-2016
06:06 AM
Thx, that was interesting to know !
... View more
12-19-2016
05:51 AM
Hi, If it's not working only on edge nodes then there might be some configuration issue leading to that. What difference do you make beetwen "cluster nodes" and "edge nodes" ? Meaning : what roles are distributed to your edge nodes ? - For example, did you assign the HDFS&Yarn "gateway" role to your edge nodes ? - If no, try doing it - If yes, try redeploying the client configuration Might be something else.
... View more
12-19-2016
05:30 AM
You are right, I just tested it and there is no need for additional settings (other than having initialized the kerberos ticket). From what I read on your first post, it seems the same job do run successfully for users which do not have their home folder in HDFS encrypted ? (for the same kerberos realm) ? If that is the case, I would open a SR ticket in your shoe. It would be the quickest way for obtaining a feedback from Cloudera on the matter (if there is an incompatiblity or some particular settings for this particular use case).
... View more
12-19-2016
05:09 AM
2 Kudos
If you want to "drop" the categories table you should run an hive query like this : DROP TABLE categories; If you want to "delete" the content of the table only then try "TRUNCATE TABLE categories;". It should work or try deleting the table content in HDFS directly. As for your use of "hadoop fs", you should know that "hadoop fs -ls rm" does not exist. For deleting HDFS files or folders it is directly "hadoop fs -rm".
... View more
12-19-2016
03:33 AM
Hi, I don't know if the map/reduce job you are submitting is Kerberos compatible. That is the first check to do. Then, if that job is Kerberos compatible, it might need some settings like supplying a jaas configuration. The kinit of a ticket is sometimes not enough. For example, when running the map/reduce job "MapReduceIndexerTool", you need to supply a jaas configuration. HADOOP_OPTS="-Djava.security.auth.login.config=/home/user/jaas.conf" \
hadoop jar MapReduceIndexerTool See: https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_sg_search_security.html
... View more
12-09-2016
08:30 AM
2 Kudos
Yes there is. In Cloudera Manager : - Go to the Key value store indexer configuration > service wide > advanced - add in the "Key-value store indexer service environnement advanced configuration snippet (safety valve)" the following information : HBASE_INDEXER_CLASSPATH=<your_classpath> Restart the service. regards, Mathieu
... View more
10-11-2016
12:25 AM
Hi, You're right, this is not something you will want to automate. This is only a workaround when you can't afford a reboot of hive (for example, when other queries not related to the particular lock are being processed). And yes, I was refering to the professional support where you can "submit" an issue you're facing for Cloudera to analyze. Good luck.
... View more
10-10-2016
12:45 AM
Hi, When it happens again you can workaround the issue by deleting the lock inside zookeeper. This will be easier and quicker than restarting Hive. But this will not solve the issue. For this kind of tricky issue I would open a ticket for Cloudera support. regards, mathieu
... View more
10-05-2016
05:37 AM
1 Kudo
Hi, Maybe you could share the java source code doing the connection ? Here is a really small working sample : public class ManageHive { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; private static Logger logger = Logger.getLogger(ManageHive.class); public static Connection getConnection(LoadProperties prop, String user) throws ClassNotFoundException, SQLException { String hiveJdbc = prop.getPropertyByName("hive_jdbc"); try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); throw e; } Connection conn2 = DriverManager.getConnection(hiveJdbc+"/extraction", user, ""); return conn2; } public static void execSql(LoadProperties prop, String user, String sql) throws SQLException, ClassNotFoundException { Connection maConn = getConnection(prop,user); Statement stmt = maConn.createStatement(); int result = stmt.executeUpdate(sql); if ( result == Statement.EXECUTE_FAILED ) { throw new SQLException("Erreur d'execution."); } } }
... View more
09-08-2016
03:03 AM
After some more testing I found that the following command is working: split '<namespace>:<table_name>', 'NEW_SPLIT_VALUE' I just need to call it once per "pre-split" value I need.
... View more
- Tags:
- some
09-08-2016
02:36 AM
Hi, I'm using CDH 5.5.2. So my version of HBase is not the same. I did try the | thing into the hbase shell command. I'm trying to "pre-split" an existing empty table. But the following command seems to not be correct for this version of HBase : alter '<namespace>:<table_name>',{ SPLITS => ['value1','value2'] } I get the following message, so I guess the "SPLITS" part is not taken into account : Unknown argument ignored: SPLITS Updating all regions with the new schema... 1/2 regions updated. 2/2 regions updated. Done. 0 row(s) in 2.4780 seconds Does someone knows the syntax for pre-splitting and already existing table (empty table) ? For CDH 5.5.2, it use HBase 1.0.0 I think.
... View more
09-06-2016
03:02 AM
Ok, I managed to make a HBase Bulk Load using Hive. There is a wiki article on that : https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad The procedure described there do not work. I guess it was made for older version of hive and HBase. With some work in order to adapt the procedure I managed to load an HBase table using the completebulkload. Here comes a working sample on that matter : sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-client.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-common.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-protocol.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar /user/hive/
# These JARs need to be added to HiveServer2 with the property hive.aux.jars.path
sudo -u hdfs hdfs dfs -chmod 554 /user/hive/*.jar
sudo -u hdfs hdfs dfs -chown hive:hive /user/hive/*.jar
total=`beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" --outputformat=csv2 --silent=true -e "SELECT count(*) FROM default.operation_client_001;"`
total=`echo $total | cut -d ' ' -f 2- `
hdfs dfs -rm -r /tmp/hb_range_keys
hdfs dfs -mkdir /tmp/hb_range_keys
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "CREATE EXTERNAL TABLE IF NOT EXISTS default.hb_range_keys(transaction_id_range_start string) row format serde 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe' stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/tmp/hb_range_keys';"
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "add jar /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; INSERT OVERWRITE TABLE default.hb_range_keys SELECT a.id FROM ( SELECT row_sequence() as num, t.id FROM default.operation_client_001 t order by t.id) a WHERE ( a.num % ( round( ${total} / 12) ) ) = 0;"
hdfs dfs -rm -r /tmp/hb_range_key_list;
hdfs dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list;
hdfs dfs -rm -r /tmp/hbsort;
hdfs dfs -mkdir /tmp/hbsort;
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "set mapred.reduce.tasks=12; set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner; set total.order.partitioner.path=/tmp/hb_range_key_list; set hfile.compression=gz; CREATE TABLE IF NOT EXISTS default.hbsort (id string, id_courtier string, cle_recherche string, cle_recherche_contrat string, nom_sous string, nom_d_usage string, prenom_sous string, date_naissance_sous string, id_contrat string, num_contrat string, produit string, fiscalite string, dt_maj string, souscription timestamp, epargne double, dt_ope_ct timestamp, type_ope_ct string, montant string, frais string, dt_ope_ct_export string, souscription_export string, montant_export string, frais_export string, montant_encours_gbl_ct_export string ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat' TBLPROPERTIES ('hfile.family.path' = '/tmp/hbsort/ti');"
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "INSERT OVERWRITE TABLE hbsort select t.* from default.operation_client_001 t cluster by t.id;"
sudo -u hdfs hdfs dfs -chgrp -R hbase /tmp/hbsort
sudo -u hdfs hdfs dfs -chmod -R 775 /tmp/hbsort
export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar completebulkload /tmp/hbsort default_operation_client_001 c
... View more
08-17-2016
12:26 AM
2 Kudos
Hi, I think sentry check if your user have specific permission on the "LOCATION" URI you have provided (and this is not related to HDFS ACL). Try to grant, in sentry, that permission too. For example : GRANT ALL ON URI 'hdfs://hdfscluster/user/testuser/part' TO ROLE <a_role>; regards, mathieu
... View more
08-16-2016
02:05 AM
For those interested : the issue was confirmed by the support with no workaround until the jira ticket listed is fixed.
... View more
08-16-2016
02:03 AM
The "multiplier" is not a parameter. You directly set the number of vcpu you want yarn to use. I guess you have to do the math yourself before setting the value.
... View more
08-11-2016
06:36 AM
Ok, since the default behaviour is unefficient I have search for a way to make the "bulk load" more efficient. I think I found a more efficient way, but there seems to be a blocker bug on that (referenced here : https://issues.apache.org/jira/browse/HIVE-13539 ) 1- The point is to set these two properties before runing the insert command : SET hive.hbase.generatehfiles=true; SET hfile.family.path=/<a_path>/<thecolumn_family_name>; 2- Then run the insert query which will prepare HFile at the designated location (instead of directly loading the HBase table). 3- And then only, performe a bulkload on HBase using the HFiles prepared. export HADOOP_CLASSPATH=`hbase classpath` yarn jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /<a_path>/<thecolumn_family_name> Problem, the query creating the HFile is failing because it "found" multiple column family because it look at the wrong folder. I'm doing my test on CDH5.7.1 Does someone already test this method ? If yes, is there some properties to set I have forgotten ? Or is this really a blocker issue ? Then I'll raise this to the support. regards, mathieu
... View more
08-09-2016
05:53 AM
Did you specify the POST parameter "execute" with the query ? https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive#WebHCatReferenceHive-URL
... View more
08-09-2016
01:00 AM
In the role HDFS there is a "NFS gateway service" that let you mount an NFS image of the HDFS. That is one way (you can directly copy file to it). (Check the performance). Hue (web ui) also let you upload files into HDFS (this is a more manual approach). In our enterprise, for an automated process, we are using a custom Java application that is using the HCatWriter API for writting into Hive tables. But you can also use the httpFs or the webHdfs.
... View more
08-09-2016
12:23 AM
Thank you for this explanation. This will help me a lot for the next steps.
... View more
08-08-2016
07:59 AM
Hi, We are facing some performance issue while loading data into HBase (using Hive queries). The Hive query is quite simple : INSERT INTO TABLE <hive_table_name_targeting_hbase_table> SELECT * FROM <hive_table> The table "<hive_table_name_targeting_hbase_table>" is an Hive table using the HBaseStorageHandler (So there is a Hbase table as the storage). The table "<hive_table>" is a regular Hive table. There is millions of lines in the <hive_table> and the <hive_table_name_targeting_hbase_table> is empty. When running the query we can see that the Yarn job generate "177 mapper" (less or more depending on the data size in <hive_table>). This part is quite "normal". But when I check the execution log of each mapper, I can see that some mapper take A LOT MORE TIME than others. Some mapper can take up to an hour (whereas the normal time of a mapper is around 10 minutes). In the log file of the "slow" mappers I can see a lot of retry on HBase operation (and finaly some exception about NotServingHBaseRegion. After some time (and a lot of retry) it's OK. But unfortunatly, this is slowing down the treatment a lot. Does someone has already encounter this ? (while loading a HBase table using Hive queries) ? Could it be related to region being split during the write ? If yes, why ? Is there some bug in the HBaseStorageHandler with too much data ? Of course the HBase table is online and can accessed normaly after loading the data. So no HBase configuration issue here (at least not a basic one). HBase compaction is set to 0 (and is launched manualy). Log sample : 2016-08-08 10:18:25,962 INFO [htable-pool1-t31] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=prd_piste_audit_gsie_traite_001, attempt=13/35 failed=28ops, last exception: null on <a_host>,60020,1467474218569, tracking started null, retrying after=20126ms, replay=28ops 2016-08-08 10:18:46,091 INFO [htable-pool1-t31] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=prd_piste_audit_gsie_traite_001, attempt=14/35 failed=28ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region prd_piste_audit_gsie_traite_001,15a55dd4-5c6e-41b3-9d2e-304015aae5e9,1470642880612.e8868eaa5ac33c4612632c2c89474ecc. is not online on <a_host>,60020,1467474218569 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2786) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:922) at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1893) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) on <a_host>,60020,1467474218569, tracking started null, retrying after=20099ms, replay=28ops
... View more
Labels:
- Labels:
-
Apache HBase
05-19-2016
12:41 AM
I'm not seeing the same issue here. Check the yarn application logs. It will surely contain information about the issue.
... View more
04-27-2016
07:53 AM
Not sure this is the latest documentation for Impala but the Hive "Date" type is not supported in Impala. Use TIMESTAMP instead for example. Check "Impala supported data types" in google. (Sorry I can't paste the url I don't know why).
... View more
03-03-2016
02:28 AM
Why not creating an Hive table on top of the text file and then simply use an hive query to load the data into the avro table ?
... View more
12-18-2015
02:29 AM
That's great to know. best regards.
... View more
- Tags:
- at to hear
12-17-2015
07:43 AM
1 Kudo
First of all we are using the exact same pattern (create a new index, indexing in it and then switch the alias). But I think your query is not really related to this but more on how Solr behave (with its JVM). Like this I'm not sure there is a particular problem. I have seen this behavior with a lot of other product. The JVM tend to not release the memory (until the garbage collector reclame it - because there is a need to re-use the memory). If you don't have an answer here, you will want to try to fine tune the GC parameters (This is a real science). But I guess the support might help you also. Best luck ! mathieu
... View more
11-09-2015
08:09 AM
Thank you for your answer. I can guess that in this particular case the replica is DOWN because of the incident we have encoutered (hdfs not available for a short period + restart of the whole cluster). But the question is : how to fix that after the problem is encoutered. Checking the solr log files did not really help on "why the replica is down". We can just observe that this particular shard do not "log" that it became active after the restart (and thus stay down). Of course, we might have miss something. I have open a support ticket to help us. They might found the problem from the logs. Isn't there any "manual" way to restore it ? replace it ? regards.
... View more
- « Previous
- Next »