About Harsh J

pgb · ‎09-09-2016

Hi, I did install from scratch a new cluster using m4 instance type and I could not reproduce the error. Thanks.

marianoguerra · ‎09-08-2016

I was using this repo http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/cloudera-kafka.repo which contains an outdated path (baseurl=http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/1/) I changed it to point to 2.0.2 and I got kafka 0.9 thanks!

mathieu.d · ‎09-06-2016

Ok, I managed to make a HBase Bulk Load using Hive. There is a wiki article on that : https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad The procedure described there do not work. I guess it was made for older version of hive and HBase. With some work in order to adapt the procedure I managed to load an HBase table using the completebulkload. Here comes a working sample on that matter : sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-client.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-common.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-protocol.jar /user/hive/ sudo -u hdfs hdfs dfs -put -f /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar /user/hive/ # These JARs need to be added to HiveServer2 with the property hive.aux.jars.path sudo -u hdfs hdfs dfs -chmod 554 /user/hive/*.jar sudo -u hdfs hdfs dfs -chown hive:hive /user/hive/*.jar total=`beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" --outputformat=csv2 --silent=true -e "SELECT count(*) FROM default.operation_client_001;"` total=`echo $total | cut -d ' ' -f 2- ` hdfs dfs -rm -r /tmp/hb_range_keys hdfs dfs -mkdir /tmp/hb_range_keys beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "CREATE EXTERNAL TABLE IF NOT EXISTS default.hb_range_keys(transaction_id_range_start string) row format serde 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe' stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/tmp/hb_range_keys';" beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "add jar /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; INSERT OVERWRITE TABLE default.hb_range_keys SELECT a.id FROM ( SELECT row_sequence() as num, t.id FROM default.operation_client_001 t order by t.id) a WHERE ( a.num % ( round( ${total} / 12) ) ) = 0;" hdfs dfs -rm -r /tmp/hb_range_key_list; hdfs dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list; hdfs dfs -rm -r /tmp/hbsort; hdfs dfs -mkdir /tmp/hbsort; beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "set mapred.reduce.tasks=12; set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner; set total.order.partitioner.path=/tmp/hb_range_key_list; set hfile.compression=gz; CREATE TABLE IF NOT EXISTS default.hbsort (id string, id_courtier string, cle_recherche string, cle_recherche_contrat string, nom_sous string, nom_d_usage string, prenom_sous string, date_naissance_sous string, id_contrat string, num_contrat string, produit string, fiscalite string, dt_maj string, souscription timestamp, epargne double, dt_ope_ct timestamp, type_ope_ct string, montant string, frais string, dt_ope_ct_export string, souscription_export string, montant_export string, frais_export string, montant_encours_gbl_ct_export string ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat' TBLPROPERTIES ('hfile.family.path' = '/tmp/hbsort/ti');" beeline -n sp35517 -p "" -u "jdbc:hive2://dn060001:10000/default" -e "INSERT OVERWRITE TABLE hbsort select t.* from default.operation_client_001 t cluster by t.id;" sudo -u hdfs hdfs dfs -chgrp -R hbase /tmp/hbsort sudo -u hdfs hdfs dfs -chmod -R 775 /tmp/hbsort export HADOOP_CLASSPATH=`hbase classpath` hadoop jar /opt/cloudera/parcels/CDH/lib/hive/lib/hbase-server.jar completebulkload /tmp/hbsort default_operation_client_001 c

Harsh J · ‎08-29-2016

I'd recommend looking for WARN or higher logs with the reference "Checkpoint" in them, to find why it aborts mid-way frequently. There were some timeout associated issues in very early CDH4 period, but I've not seen this issue repeat with CDH5, even for very large fsimages.

jeet28j · ‎08-29-2016

Thanks for this clarification. I got my answer.

Harsh J · ‎08-24-2016

Adding onto @dice's post, this WARN does not impair any current functionality your HDFS is performing. It can be ignored until you are able to grab the bug-fix via the update to 5.7.2 or higher. See also the past community topic on the same question: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/quot-Report-from-the-DataNode-datanodeUuid-is-unsorted-quot/m-p/41943#M2188

SandeepP · ‎08-24-2016

Thanks.

Paul Yang · ‎08-22-2016

Hi, Harsh, the issue gone when i package the sunjce_provider.jar of JRE into lib folder. Thanks BR Paul

RanCohen · ‎08-19-2016

Thanks, good to know.

vincentvc · ‎08-16-2016

Thanks, you are right. I just discovered that there are two kadmin packages installed for unknown reason. Maybe it is because I changed the PATH variable once and installed the kadmin in other path where is different from the default path setting in CM. I solved the problem with correcting the PATH variable and reinstalling the package. Once again, thank you for you help

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Problem with starting CDH cluster on AWS using...

Re: Kafka version in RHEL/CentOS repo is behind pa...

Re: HBase slow bulk loading using Hive

Re: fsimage.ckpt_001122334455 files question

Re: Can Hbase replication feature be used in Cloud...

Re: name node log full of WARN Please update the D...

Re: Is hbase snapshot valid after hfiles get delet...

Re: LoginException: Algorithm HmacMD5 not availabl...

Re: MapReduce client retrying to connect after job...

Re: [CDH 5.8 Kerberos] Generate Missing Credential...