About mathieu.d

pdvorak · ‎03-22-2017

Mathieu is correct, you can use: solrctl instancedir --update collection1 /path/to/collection1 solrctl collection --reload collection -pd

mathieu.d · ‎03-17-2017

Well, the lock is stored in zookeeper. So you can search in zookeeper if the lock exists and delete it if yes. But I would advise you not to do so. Locks exist for data integrity. If you remove them while you should not it could lead to some "odd results". Maybe you could add some error management in the workflow in order to "retry X times" before failing the whole workflow ? You could also better "communicate" with the users in order to reduce the likely hood of having your scheduled queries running concurrently with "users" queries.

Fawze · ‎02-23-2017

Digging down in the cluster, i found one of the application that runs outside of the hadoop cluster has clients that make hdfs dfs -put to the hadoop cluster, these clients weren't have hdfs-site.xml and it got the default replication factor for the cluster, what i did? tested the hdfs dfs -put from a cleint server in my cluster and the client out side the cluster and notice the client outside the cluster put files with replication factor 3, to solve the issue i added hdfs-site.xml to each of the clients outside the cluster and override the default replication factor at the file.

mathieu.d · ‎02-14-2017

Give us your indexer_def.xml and morphline conf. There should be an "id" field somewhere. And I guess you will find it in the indexer_def.xml file. For example : <indexer table="<hbase_table_name>" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper" unique-key-field="id">

mbigelow · ‎01-17-2017

On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.

marccasajus · ‎01-11-2017

Hi cpluplus1, To log into hive server 2 by command line you need this: $ beeline -u "jdbc:hive2://hive_node:10000/;principal=hive/_HOST@ad_domain To log into hive server 2 web UI: http://hive_node:10002/ To run queries from HUE into Hive: https://hue_node:8888/notebook/editor?type=hive With which user are you logging into hue? Maybe you don't have enough privileges to access in hive query editor, can you access with administration user and validate it? Marc.

Olex · ‎12-21-2016

I was able to resolve the issue. It occurs because of the public-only network access from the client (edge node) to a multi-homed cluster environment (Oracle Big Data appliance in my case) and is also related to the bug MAPREDUCE-6484 . Patch is available for it and in my case it was already included in CDH 5.7.1 (CDH 5.7.1 Release Notes). However, there was an additional setting that needed to be done on Yarn to make it work: 1. Token service naming behavior needed to be changed via core-site.xml. Under CM > YARN > Configuration > Scope: YARN (Service-Wide) > Category: Advanced > "YARN Service Advanced Configuration Snippet (Safety Valve) for core-site.xml" add below property: <property> <name>hadoop.security.token.service.use_ip</name> <value>false</value> </property> 2. Save the configuration change. 3. Deploy Client Configurations for YARN. Restart YARN Services as needed. The details on above setting and the discussion can be found at HADOOP-7510

SudheerHadoop · ‎10-05-2016

Hi Thanks for the Reply,I solved the issue.I have used following driver public static String driverName = "com.cloudera.hive.jdbc41.HS1Driver";and added other required jars too.

mathieu.d · ‎09-08-2016

After some more testing I found that the following command is working: split '<namespace>:<table_name>', 'NEW_SPLIT_VALUE' I just need to call it once per "pre-split" value I need.

mathieu.d · ‎09-06-2016

Ok, I managed to make a HBase Bulk Load using Hive. There is a wiki article on that : https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad The procedure described there do not work. I guess it was made for older version of hive and HBase. With some work in order to adapt the procedure I managed to load an HBase table using the completebulkload. Here comes a working sample on that matter : sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-client.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-common.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-protocol.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar /user/hive/ # These JARs need to be added to HiveServer2 with the property hive.aux.jars.path sudo -u hdfs hdfs dfs -chmod 554 /user/hive/*.jar sudo -u hdfs hdfs dfs -chown hive:hive /user/hive/*.jar total=`beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" --outputformat=csv2 --silent=true -e "SELECT count(*) FROM default.operation_client_001;"` total=`echo $total | cut -d ' ' -f 2- ` hdfs dfs -rm -r /tmp/hb_range_keys hdfs dfs -mkdir /tmp/hb_range_keys beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "CREATE EXTERNAL TABLE IF NOT EXISTS default.hb_range_keys(transaction_id_range_start string) row format serde 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe' stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/tmp/hb_range_keys';" beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "add jar /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; INSERT OVERWRITE TABLE default.hb_range_keys SELECT a.id FROM ( SELECT row_sequence() as num, t.id FROM default.operation_client_001 t order by t.id) a WHERE ( a.num % ( round( ${total} / 12) ) ) = 0;" hdfs dfs -rm -r /tmp/hb_range_key_list; hdfs dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list; hdfs dfs -rm -r /tmp/hbsort; hdfs dfs -mkdir /tmp/hbsort; beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "set mapred.reduce.tasks=12; set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner; set total.order.partitioner.path=/tmp/hb_range_key_list; set hfile.compression=gz; CREATE TABLE IF NOT EXISTS default.hbsort (id string, id_courtier string, cle_recherche string, cle_recherche_contrat string, nom_sous string, nom_d_usage string, prenom_sous string, date_naissance_sous string, id_contrat string, num_contrat string, produit string, fiscalite string, dt_maj string, souscription timestamp, epargne double, dt_ope_ct timestamp, type_ope_ct string, montant string, frais string, dt_ope_ct_export string, souscription_export string, montant_export string, frais_export string, montant_encours_gbl_ct_export string ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat' TBLPROPERTIES ('hfile.family.path' = '/tmp/hbsort/ti');" beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "INSERT OVERWRITE TABLE hbsort select t.* from default.operation_client_001 t cluster by t.id;" sudo -u hdfs hdfs dfs -chgrp -R hbase /tmp/hbsort sudo -u hdfs hdfs dfs -chmod -R 775 /tmp/hbsort export HADOOP_CLASSPATH=`hbase classpath` hadoop jar /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar completebulkload /tmp/hbsort default_operation_client_001 c

Online	Offline
Last Visited	‎01-17-2018 02:52 AM

Member Since	‎07-16-2015 01:41 AM
Last Visited	‎01-17-2018 02:52 AM
Posts	177
Kudos received	28

Cloudera Community

Re: Unable to delete HDFS Corrupt files

Re: Hive partitions based on date from timestamp

Re: Partition Hive Table to Hbase Handler ?

Re: yarn logs location on disk

Re: Increase Flume graceful restart time

Re: update SolrConfig.xml without recreating Colle...

Re: UNLOCK TABLE if lock exists in HQL file

Re: NameNode alerting on Blocks under replicated e...

Re: Error while batch importing from HBase to Solr...

Re: Hive Queries run slowly

Re: How to loginto the Hive server 2

Re: Sample MR job fails from edge node after encry...

Re: Exception in thread "main" java.sql.SQLExcepti...

Re: HBase - alter table - add pre-splits

Re: HBase slow bulk loading using Hive