Member since
10-03-2020
235
Posts
15
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1222 | 08-28-2023 02:13 AM | |
1754 | 12-15-2021 05:26 PM | |
1636 | 10-22-2021 10:09 AM | |
4605 | 10-20-2021 08:44 AM | |
4620 | 10-20-2021 01:01 AM |
10-22-2021
10:09 AM
Hi @Rjkoop Visibility labels are not officially supported by Cloudera, please refer to this link: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_620_unsupported_features.html#hbase_c6_unsupported_features Regards, Will
... View more
10-20-2021
08:44 AM
Hi @DA-Ka, SUM and JOIN won't change the timestamp of the underlying file. Example: create table mytable (i int,j int,k int); insert into mytable values (1,2,3),(4,5,6),(7,8,9); create table mytable2 (i int,j int,k int); insert into mytable2 values (1,2,6),(3,5,7),(4,8,9); select * from mytable; +------------+------------+------------+ | mytable.i | mytable.j | mytable.k | +------------+------------+------------+ | 1 | 2 | 3 | | 4 | 5 | 6 | | 7 | 8 | 9 | +------------+------------+------------+ select * from mytable2; +-------------+-------------+-------------+ | mytable2.i | mytable2.j | mytable2.k | +-------------+-------------+-------------+ | 1 | 2 | 6 | | 3 | 5 | 7 | | 4 | 8 | 9 | +-------------+-------------+-------------+ # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0 # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2 drwxrwx---+ - hive hive 0 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 742 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0 1. Sum, timestamp is unchanged select pos+1 as col,sum (val) as sum_col from mytable t lateral view posexplode(array(*)) pe group by pos; +------+----------+ | col | sum_col | +------+----------+ | 2 | 15 | | 1 | 12 | | 3 | 18 | +------+----------+ # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0 2. Inner Join, timestamp is unchanged select * from (select * from mytable)T1 join (select * from mytable2)T2 on T1.i=T2.i +-------+-------+-------+-------+-------+-------+ | t1.i | t1.j | t1.k | t2.i | t2.j | t2.k | +-------+-------+-------+-------+-------+-------+ | 1 | 2 | 3 | 1 | 2 | 6 | | 4 | 5 | 6 | 4 | 8 | 9 | +-------+-------+-------+-------+-------+-------+ sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0 sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2 drwxrwx---+ - hive hive 0 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 742 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0 Regards, Will
... View more
10-20-2021
01:01 AM
Hi @DA-Ka, Below example is inspired by this link 1) use -t -R to list files recursively with timestamp: # sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 drwxrwx---+ - hive hive 0 2021-10-20 06:14 /warehouse/tablespace/managed/hive/sample_07/.hive-staging_hive_2021-10-20_06-13-50_654_7549698524549477159-1 drwxrwx---+ - hive hive 0 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000 -rw-rw----+ 3 hive hive 48464 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0 2) filter the files older than a timestamp: sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) <= "2021-10-20 06:13") {print ($6" "$7" "$8)}}' # sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) <= "2021-10-20 06:13") {print ($6" "$7" "$8)}}' 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0 Regarding your last question, if sum or join could change the timestamp, I'm not sure, please try and then use above commands to see the timestamps. Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more
10-19-2021
04:57 AM
1 Kudo
Hi @kras, From the evidences you provided, the most frequent warning is: WARN [RpcServer.default.FPBQ.Fifo.handler=10,queue=10,port=16020] regionserver.RSRpcServices: Large batch operation detected (greater than 5000) (HBASE-18023). Requested Number of Rows: 12596 Client: svc-stats//ip first region in multi=table_name,\x09,1541077881948.9bcc8cee00ab92b2402730813923c2f6. which indicates when an RPC is received from a client that has more than 5000 "actions" (where an "action" is a collection of mutations for a specific row) in a single RPC. Misbehaving clients who send large RPCs to RegionServers can be malicious, causing temporary pauses via garbage collection or denial of service via crashes. The threshold of 5000 actions per RPC is defined by the property "hbase.rpc.rows.warning.threshold" in hbase-site.xml. Please refer to this jira: https://issues.apache.org/jira/browse/HBASE-18023 for detailed explanation. We can identify the table name is "table_name", please check which application is writing / reading this table. Simplest way is to halt this application, to see if performance is improved. If you identified the latency spike is due to this table, please improve your application logic, control your batch size. If you have already improved the "harmful" applications but still see performance issues, I would recommend you read through this article which include most common performance issues and tuning suggestions: https://community.cloudera.com/t5/Community-Articles/Tuning-Hbase-for-optimized-performance-Part-1/ta-p/248137 This article has 5 parts, please read through it you will have ideas to tune your hbase. This issue looks like a little complex, there will be multi-factors to impact your hbase performance. We encourage you to raise support cases with Cloudera. Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more
10-17-2021
06:45 AM
Hi @dzbeda, The definition of "dfs.balancer.getBlocks.min-block-size" is "Smallest block to consider for moving". What is the version of hadoop? Is it CDH or HDP? What is the version of CDH / HDP? For CDH please refer to: https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_hdfs_balancer.html#cmug_topic_5_14__section_lqb_rzp_x2b https://docs.cloudera.com/documentation/enterprise/6/properties/6.1/topics/cm_props_cdh5160_hdfs.html#concept_6.1.x_balancer_props HDFS Balancer and DataNode Space Usage Considerations: https://my.cloudera.com/knowledge/HDFS-Balancer-and-DataNode-Space-Usage-Considerations?id=73869 Regards, Will
... View more
10-13-2021
08:00 PM
Hi @kras, 1. Is it CDH or HDP, what is the version. 2. In regionserver logs is there “responseTooSlow” or “operationTooSlow” or any other WARN/ERROR messages. please provide log snippets. 3. How is the locality of the regions (check locality on hbase webUI, click on table, on right side there is a column shows each region locality.) 4. How many regions deployed on each RegionServer. 5. Any warning / errors in RS log around the spike? 6. Is any job trying to scan every 10 min? Which table contribute most I/O? Is there any hotspot. 7. is HDFS healthy? check DN logs, is there any slow messages around the spike? Refer to https://my.cloudera.com/knowledge/Diagnosing-Errors-Error-Slow-ReadProcessor-Error-Slow?id=73443 Regards, Will
... View more
10-02-2021
04:19 AM
1 Kudo
@Tamiri , Please click on your avatar and check My settings > SUBSCRIPTIONS&NOTIFICATIONS Another place is when you reply to post, on the top right select "Email me when someone replies". Regards, Will
... View more
10-01-2021
07:01 AM
Hello @rahuledavalath, What HDP version and what CDP version are you using? Regards, Will
... View more
09-29-2021
09:50 AM
1 Kudo
Then above solutions meet your needs.
... View more
09-29-2021
09:14 AM
Hi @Visvanath_JP, The question could be more specific like what hadoop versions are two clusters, are both clusters secured, are they CDH/CDP or HDP. Do you only migrate data in HDFS layer or other layer, for example hive / hbase / kudu. The most common way is using distcp to migrate data between hdfs clusters. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/scaling-namespaces/topics/hdfs-distcp-to-copy-files.html If you are using CDH/CDP, BDR job is another choice (distcp integrated) https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/replication-manager/topics/rm-dc-hdfs-replication.html Distcp guide: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#:~:text=DistCp%20(distributed%20copy)%20is%20a,specified%20in%20the%20source%20list. Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more