Member since
10-03-2020
235
Posts
15
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
482 | 11-11-2024 09:31 AM | |
1338 | 08-28-2023 02:13 AM | |
1874 | 12-15-2021 05:26 PM | |
1714 | 10-22-2021 10:09 AM | |
4851 | 10-20-2021 08:44 AM |
04-11-2023
05:33 AM
Please refer to this doc https://blog.cloudera.com/apache-hbase-region-splitting-and-merging/ for split policy. So far based on your statement I cannot conclude it is due to salt bucket or split policy, we need more evidence from logs. So we would suggest you raise a Cloudera support case. We need to collect some necessary information and logs to investigate. Please make sure the above questions are answered, in addition, we also need to collect: hbase: - echo "scan 'namespace:tablename'" > /tmp/scan_meta.txt - echo "describe 'namespace:tablename'" > /tmp/desc_table.txt - echo "list_regions 'namespace:tablename'">/tmp/list_regions.txt phoenix-sqlline: - select * from system.catalog; - !tables - select * from namespace.table; - Your client code of using phoenix driver and the output reflects the issue "when I am querying data through Phoenix driver rowkey value is getting truncated (only the first letter) and other columns are good."
... View more
04-10-2023
11:07 PM
Hi @bavisetti , Please kindly provide your CDH/CDP version, hbase version, phoenix version and the Phoenix driver version. Are you able to use the phoenix driver to create phoenix table, upsert into it and select data from it? Are you able to do above in phoenix-sqlline. If you already have a hbase table, you may need to create a view in phoenix so the phoenix client could be able to read it. Refer to https://phoenix.apache.org/language/index.html#create_view Thanks, Will
... View more
08-02-2022
03:32 AM
Hello @syedshakir , Please let us know what is your cdh version? Case A: If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it: https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli Case B: To be honest I never did it so I would try: 1. follow the below document to configure google cloud storage with hadoop: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html 2. if distcp cannot work then follow this document to configure some properties: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html 3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload. If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs: export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true" Thanks, Will
... View more
04-22-2022
02:44 AM
Hello @arunr307 , What is the CDH version?Could you attach the full output of this command, from the command help menu there's no properties about split size: # hbase org.apache.hadoop.hbase.mapreduce.Export ERROR: Wrong number of arguments: 0 Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]] Note: -D properties will be applied to the conf used. For example: -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec -D mapreduce.output.fileoutputformat.compress.type=BLOCK Additionally, the following SCAN properties can be specified to control/limit what is exported.. -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ... -D hbase.mapreduce.include.deleted.rows=true -D hbase.mapreduce.scan.row.start=<ROWSTART> -D hbase.mapreduce.scan.row.stop=<ROWSTOP> -D hbase.client.scanner.caching=100 -D hbase.export.visibility.labels=<labels> For tables with very wide rows consider setting the batch size as below: -D hbase.export.scanner.batch=10 -D hbase.export.scanner.caching=100 -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export For MR performance consider the following properties: -D mapreduce.map.speculative=false -D mapreduce.reduce.speculative=false Thanks, Will
... View more
01-18-2022
06:30 AM
Hi @rahul_gaikwad, The issue occurs due to a known limitation. As the code points out, it indicates that the single write operation cannot fit into the configured maximum buffer size.Please refer to this KB: https://my.cloudera.com/knowledge/quot-ERROR-Error-applying-Kudu-Op-Incomplete-buffer-size?id=302775 Regards, Will
... View more
01-18-2022
06:26 AM
Hi @naveenks, Please refer to below doc: https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_admin_distcp_data_cluster_migrate.html Thanks, Will
... View more
12-15-2021
05:26 PM
Hi @ryu Volume: As described in HDFS architecture, the NameNode stores metadata while the DataNodes store the actual data content. Each DataNode is a computer which usually consists of multiple disks (in HDFS’ terminology, volumes). A file in HDFS contains one or more blocks. A block has one or multiple copies (called Replicas), based on the configured replication factor. A replica is stored on a volume of a DataNode, and different replicas of the same block are stored on different DataNodes. https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/ Directory(usually don't say it folders): like other file system, hdfs directory is hierarchical file structure https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html Regards, Will
... View more
12-15-2021
04:59 PM
I agree @Nandinin's suggestion. Adding some thoughts on hdfs side for your reference: 1. Now you know which 3 DNs maybe slow in the pipeline and the timestamp. So you can go to each datanode log, to see if there are "JvmPauseMonitor" ? or "Lock held"? or other WARN / ERROR ? 2. Refer to this KB https://my.cloudera.com/knowledge/Diagnosing-Errors-Error-Slow-ReadProcessor-Error-Slow?id=73443, check the Slow message from DN logs around the above timestamp to determine what is the main cause. Regards, Will
... View more
11-09-2021
04:02 AM
Hi @loridigia, Based on the current error you provided "org.apache.hadoop.hbase.NotServingRegionException: table XXX is not online on worker04" maybe some regions are not deployed on any RegionServers yet. please check this result to see is there any inconsistencies on this table: 1. sudo -u hbase hbase hbck -details > /tmp/hbck.txt 2. If you see inconsistencies please grep ERROR from hbck.txt you will see which region has problem. 3. Then you need to check if this region's directory is complete in this result: hdfs dfs -ls -R /hbase 4. Then need to check in hbase shell : scan 'hbase:meta', if this region's info are updated in hbase:meta table. 5. Based on type of the issue we need to use hbck2 jar to fix the inconsistencies. https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2 These are general steps to deal with this kind of problem, there could be more complex issues behind it. We suggest you to file a case with Cloudera support. Thanks, Will
... View more
10-28-2021
02:57 AM
Hi @uygg, Please check if 3rd party jars like Bouncy castle jars are added. If that is the cause please remove them then restart RM. Thanks, Will
... View more