About willx

willx · ‎04-11-2023

Please refer to this doc https://blog.cloudera.com/apache-hbase-region-splitting-and-merging/ for split policy. So far based on your statement I cannot conclude it is due to salt bucket or split policy, we need more evidence from logs. So we would suggest you raise a Cloudera support case. We need to collect some necessary information and logs to investigate. Please make sure the above questions are answered, in addition, we also need to collect: hbase: - echo "scan 'namespace:tablename'" > /tmp/scan_meta.txt - echo "describe 'namespace:tablename'" > /tmp/desc_table.txt - echo "list_regions 'namespace:tablename'">/tmp/list_regions.txt phoenix-sqlline: - select * from system.catalog; - !tables - select * from namespace.table; - Your client code of using phoenix driver and the output reflects the issue "when I am querying data through Phoenix driver rowkey value is getting truncated (only the first letter) and other columns are good."

willx · ‎04-10-2023

Hi @bavisetti , Please kindly provide your CDH/CDP version, hbase version, phoenix version and the Phoenix driver version. Are you able to use the phoenix driver to create phoenix table, upsert into it and select data from it? Are you able to do above in phoenix-sqlline. If you already have a hbase table, you may need to create a view in phoenix so the phoenix client could be able to read it. Refer to https://phoenix.apache.org/language/index.html#create_view Thanks, Will

willx · ‎08-02-2022

Hello @syedshakir , Please let us know what is your cdh version? Case A: If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it: https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli Case B: To be honest I never did it so I would try: 1. follow the below document to configure google cloud storage with hadoop: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html 2. if distcp cannot work then follow this document to configure some properties: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html 3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload. If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs: export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true" Thanks, Will

willx · ‎04-22-2022

Hello @arunr307 , What is the CDH version？Could you attach the full output of this command, from the command help menu there's no properties about split size： # hbase org.apache.hadoop.hbase.mapreduce.Export ERROR: Wrong number of arguments: 0 Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]] Note: -D properties will be applied to the conf used. For example: -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec -D mapreduce.output.fileoutputformat.compress.type=BLOCK Additionally, the following SCAN properties can be specified to control/limit what is exported.. -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ... -D hbase.mapreduce.include.deleted.rows=true -D hbase.mapreduce.scan.row.start=<ROWSTART> -D hbase.mapreduce.scan.row.stop=<ROWSTOP> -D hbase.client.scanner.caching=100 -D hbase.export.visibility.labels=<labels> For tables with very wide rows consider setting the batch size as below: -D hbase.export.scanner.batch=10 -D hbase.export.scanner.caching=100 -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export For MR performance consider the following properties: -D mapreduce.map.speculative=false -D mapreduce.reduce.speculative=false Thanks, Will

willx · ‎01-18-2022

Hi @rahul_gaikwad, The issue occurs due to a known limitation. As the code points out, it indicates that the single write operation cannot fit into the configured maximum buffer size.Please refer to this KB: https://my.cloudera.com/knowledge/quot-ERROR-Error-applying-Kudu-Op-Incomplete-buffer-size?id=302775 Regards, Will

willx · ‎01-18-2022

Hi @naveenks, Please refer to below doc: https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_admin_distcp_data_cluster_migrate.html Thanks, Will

willx · ‎12-15-2021

Hi @ryu Volume: As described in HDFS architecture, the NameNode stores metadata while the DataNodes store the actual data content. Each DataNode is a computer which usually consists of multiple disks (in HDFS’ terminology, volumes). A file in HDFS contains one or more blocks. A block has one or multiple copies (called Replicas), based on the configured replication factor. A replica is stored on a volume of a DataNode, and different replicas of the same block are stored on different DataNodes. https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/ Directory(usually don't say it folders): like other file system, hdfs directory is hierarchical file structure https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html Regards, Will

willx · ‎12-15-2021

I agree @Nandinin's suggestion. Adding some thoughts on hdfs side for your reference: 1. Now you know which 3 DNs maybe slow in the pipeline and the timestamp. So you can go to each datanode log, to see if there are "JvmPauseMonitor" ? or "Lock held"? or other WARN / ERROR ? 2. Refer to this KB https://my.cloudera.com/knowledge/Diagnosing-Errors-Error-Slow-ReadProcessor-Error-Slow?id=73443, check the Slow message from DN logs around the above timestamp to determine what is the main cause. Regards, Will

willx · ‎11-09-2021

Hi @loridigia, Based on the current error you provided "org.apache.hadoop.hbase.NotServingRegionException: table XXX is not online on worker04" maybe some regions are not deployed on any RegionServers yet. please check this result to see is there any inconsistencies on this table: 1. sudo -u hbase hbase hbck -details > /tmp/hbck.txt 2. If you see inconsistencies please grep ERROR from hbck.txt you will see which region has problem. 3. Then you need to check if this region's directory is complete in this result: hdfs dfs -ls -R /hbase 4. Then need to check in hbase shell : scan 'hbase:meta', if this region's info are updated in hbase:meta table. 5. Based on type of the issue we need to use hbck2 jar to fix the inconsistencies. https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2 These are general steps to deal with this kind of problem, there could be more complex issues behind it. We suggest you to file a case with Cloudera support. Thanks, Will

willx · ‎10-28-2021

Hi @uygg, Please check if 3rd party jars like Bouncy castle jars are added. If that is the cause please remove them then restart RM. Thanks, Will

Online	Offline
Last Visited	‎12-26-2024 09:44 PM

Member Since	‎10-03-2020 06:12 AM
Last Visited	‎12-26-2024 09:44 PM
Posts	235
Kudos received	14

Cloudera Community

Re: Services not starting up after Enabling Kerber...

Re: What is the difference between volumes and fol...

Re: Hbase labels table creation

Re: All Hdfs file names older than N days

Re: All Hdfs file names older than N days

Re: Hbase table issue while reading data through p...

Re: Hbase table issue while reading data through p...

Re: Copy HDFS data to GCP

Re: Hbase export - split the sequence files with s...

Re: WARNINGS: Error applying Kudu Op.: Incomplete:...

Re: Copying HDFS data in CDH 5.16 to any cloud sto...

Re: What is the difference between volumes and fol...

Re: Slow ReadProcessor warnings leading to applica...

Re: HBASE NotServingRegionException

Re: cdh6.3.2版本的yarn的resourcemanager启动失败问题？？？