Member since
07-30-2020
219
Posts
45
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
506 | 11-20-2024 11:11 PM | |
521 | 09-26-2024 05:30 AM | |
1097 | 10-26-2023 08:08 AM | |
1919 | 09-13-2023 06:56 AM | |
2160 | 08-25-2023 06:04 AM |
08-19-2022
10:40 AM
Something similar is discussed in this post but then again we discussed this already. maybe tez.grouping.split-count can help. Some more info here as well as in https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java
... View more
08-19-2022
10:15 AM
1 Kudo
Hi @mike_bronson7 , maybe you can try using Ambari API as described here and check if that works. More info in this article
... View more
08-19-2022
09:46 AM
Hi @BORDIN , The parameter "dfs.client.block.write.locateFollowingBlock.retries" is used to tackle this situation when the file doesn't close by increasing the retries. More info on this is here . As it is still failing, I would suggest to look in the Namenode log for the reason behind the failure. You can grep for block "blk_6867946754_5796409008" or check the Namenode and Datanode log for any WARN/ERROR when the put operation was done.
... View more
08-19-2022
12:18 AM
Hi @fsm17 , As you are using tez as an execution engine, I would suggest to set the below in hive which controls the number of mappers. tez.grouping.max-size(default 1073741824 which is 1GB) : The most data Tez will assign to a task. Decreasing this means more parallelism tez.grouping.min-size(default 52428800 which is 50MB) : The least data Tez will assign to a task. Increasing this means less parallelism
... View more
08-16-2022
10:00 AM
@Ben1978 You can refer the below docs for spark 3,3.1,3.2 respectively https://docs.cloudera.com/cdp-private-cloud-base/7.1.4/cds-3/topics/spark-spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/cds-3/topics/spark-spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/cds-3/topics/spark-3-requirements.html As CDP 7.1.8 is not release yet, we don't have an official doc on that. -- Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
08-16-2022
06:34 AM
Hi @noekmc , Are you referring to the below "Restart service" highlighted option that you see under ldap_url? If yes, it is expected and you can refer my earlier comment.
... View more
08-16-2022
05:44 AM
Hi @Ben1978 Spark 3.3 will be a compatible with CDP 7.1.8 which is yet to be released.
... View more
08-12-2022
01:07 PM
Hi @BORDIN , Are you able to copy any other files to hdfs using hdfs dfs -put? Does this happen all the time? This problem can be caused by the range of different issues that can cause the datanode block reports being delayed from reaching the namenode or the namenode being delayed when processing them. Can you check the Namenode log for the same time when put was done for any WARN/ERRORS. If the Namenode is busy, we can perform more retries and thereby more time for Namenode for the block write to complete. - Go to Cloudera Manager -> HDFS -> Configuration -> HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and add an entry like following:
<property>
<name>dfs.client.block.write.locateFollowingBlock.retries</name>
<value>10</value>
</property> - Save changes
- Restart the stale services and deploy the client configuration.
... View more
08-10-2022
07:15 AM
Hi @yagoaparecidoti , Yes, as you are using Ambari to manage the cluster, you can add the property as follows : Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site -> Add Property dfs.client.block.write.replace-datanode-on-failure.policy=ALWAYS
... View more
08-10-2022
06:18 AM
Hi @noekmc The alert saying "Requires Server Restart" for the config properties is expected to be there to help administrators to run the "systemctl restart cloudera-scm-server" on the CM host whenever they change the configs and even post changing the config and restarting the CM server. So, how are you validating that the config has not taken affect? To confirm this, you can simply login to the CM database and run a select query on the "configs" table to check if the properties are there. For eg, on my postgresDB, # sudo -u postgres psql postgres=# \c scm scm=# select attr, value, host_id, service_id from configs where attr like 'ldap%'; attr | value | host_id | service_id --------------+--------+---------+------------ ldap_type | LDAP | | ldap_bind_pw | abcdef | | ldap_bind_dn | robin | | (3 rows) So even after I have added some random values for LDAP in CM UI, the UI will show ""Requires Server Restart" for the already changed properties.
... View more