Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14303 | 11-14-2017 01:11 AM | |
60724 | 11-03-2017 06:53 AM | |
4342 | 11-03-2017 06:18 AM | |
13595 | 09-12-2017 05:51 AM | |
2008 | 09-08-2017 02:50 AM |
03-22-2017
03:00 PM
Mathieu is correct, you can use: solrctl instancedir --update collection1 /path/to/collection1 solrctl collection --reload collection -pd
... View more
03-17-2017
04:01 AM
Well, the lock is stored in zookeeper. So you can search in zookeeper if the lock exists and delete it if yes. But I would advise you not to do so. Locks exist for data integrity. If you remove them while you should not it could lead to some "odd results". Maybe you could add some error management in the workflow in order to "retry X times" before failing the whole workflow ? You could also better "communicate" with the users in order to reduce the likely hood of having your scheduled queries running concurrently with "users" queries.
... View more
02-23-2017
12:49 PM
Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.
... View more
02-14-2017
05:30 AM
Give us your indexer_def.xml and morphline conf. There should be an "id" field somewhere. And I guess you will find it in the indexer_def.xml file. For example : <indexer table="<hbase_table_name>" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper"
unique-key-field="id">
... View more
01-17-2017
11:50 AM
On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.
... View more
01-11-2017
02:19 AM
Hi cpluplus1, To log into hive server 2 by command line you need this: $ beeline -u "jdbc:hive2://hive_node:10000/;principal=hive/_HOST@ad_domain To log into hive server 2 web UI: http://hive_node:10002/ To run queries from HUE into Hive: https://hue_node:8888/notebook/editor?type=hive With which user are you logging into hue? Maybe you don't have enough privileges to access in hive query editor, can you access with administration user and validate it? Marc.
... View more
12-21-2016
06:54 AM
1 Kudo
I was able to resolve the issue. It occurs because of the public-only network access from the client (edge node) to a multi-homed cluster environment (Oracle Big Data appliance in my case) and is also related to the bug MAPREDUCE-6484 . Patch is available for it and in my case it was already included in CDH 5.7.1 (CDH 5.7.1 Release Notes). However, there was an additional setting that needed to be done on Yarn to make it work: 1. Token service naming behavior needed to be changed via core-site.xml. Under CM > YARN > Configuration > Scope: YARN (Service-Wide) > Category: Advanced > "YARN Service Advanced Configuration Snippet (Safety Valve) for core-site.xml" add below property: <property> <name>hadoop.security.token.service.use_ip</name> <value>false</value> </property> 2. Save the configuration change. 3. Deploy Client Configurations for YARN. Restart YARN Services as needed. The details on above setting and the discussion can be found at HADOOP-7510
... View more
10-05-2016
05:46 AM
Hi Thanks for the Reply,I solved the issue.I have used following driver public static String driverName = "com.cloudera.hive.jdbc41.HS1Driver";and added other required jars too.
... View more
09-08-2016
03:03 AM
After some more testing I found that the following command is working: split '<namespace>:<table_name>', 'NEW_SPLIT_VALUE' I just need to call it once per "pre-split" value I need.
... View more
09-06-2016
03:02 AM
Ok, I managed to make a HBase Bulk Load using Hive. There is a wiki article on that : https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad The procedure described there do not work. I guess it was made for older version of hive and HBase. With some work in order to adapt the procedure I managed to load an HBase table using the completebulkload. Here comes a working sample on that matter : sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-client.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-common.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-protocol.jar /user/hive/
sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar /user/hive/
# These JARs need to be added to HiveServer2 with the property hive.aux.jars.path
sudo -u hdfs hdfs dfs -chmod 554 /user/hive/*.jar
sudo -u hdfs hdfs dfs -chown hive:hive /user/hive/*.jar
total=`beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" --outputformat=csv2 --silent=true -e "SELECT count(*) FROM default.operation_client_001;"`
total=`echo $total | cut -d ' ' -f 2- `
hdfs dfs -rm -r /tmp/hb_range_keys
hdfs dfs -mkdir /tmp/hb_range_keys
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "CREATE EXTERNAL TABLE IF NOT EXISTS default.hb_range_keys(transaction_id_range_start string) row format serde 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe' stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/tmp/hb_range_keys';"
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "add jar /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; INSERT OVERWRITE TABLE default.hb_range_keys SELECT a.id FROM ( SELECT row_sequence() as num, t.id FROM default.operation_client_001 t order by t.id) a WHERE ( a.num % ( round( ${total} / 12) ) ) = 0;"
hdfs dfs -rm -r /tmp/hb_range_key_list;
hdfs dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list;
hdfs dfs -rm -r /tmp/hbsort;
hdfs dfs -mkdir /tmp/hbsort;
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "set mapred.reduce.tasks=12; set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner; set total.order.partitioner.path=/tmp/hb_range_key_list; set hfile.compression=gz; CREATE TABLE IF NOT EXISTS default.hbsort (id string, id_courtier string, cle_recherche string, cle_recherche_contrat string, nom_sous string, nom_d_usage string, prenom_sous string, date_naissance_sous string, id_contrat string, num_contrat string, produit string, fiscalite string, dt_maj string, souscription timestamp, epargne double, dt_ope_ct timestamp, type_ope_ct string, montant string, frais string, dt_ope_ct_export string, souscription_export string, montant_export string, frais_export string, montant_encours_gbl_ct_export string ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat' TBLPROPERTIES ('hfile.family.path' = '/tmp/hbsort/ti');"
beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "INSERT OVERWRITE TABLE hbsort select t.* from default.operation_client_001 t cluster by t.id;"
sudo -u hdfs hdfs dfs -chgrp -R hbase /tmp/hbsort
sudo -u hdfs hdfs dfs -chmod -R 775 /tmp/hbsort
export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar completebulkload /tmp/hbsort default_operation_client_001 c
... View more
- « Previous
- Next »