About rki_

rki_ · ‎08-19-2022

Something similar is discussed in this post but then again we discussed this already. maybe tez.grouping.split-count can help. Some more info here as well as in https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java

rki_ · ‎08-19-2022

Hi @mike_bronson7 , maybe you can try using Ambari API as described here and check if that works. More info in this article

rki_ · ‎08-19-2022

Hi @BORDIN , The parameter "dfs.client.block.write.locateFollowingBlock.retries" is used to tackle this situation when the file doesn't close by increasing the retries. More info on this is here . As it is still failing, I would suggest to look in the Namenode log for the reason behind the failure. You can grep for block "blk_6867946754_5796409008" or check the Namenode and Datanode log for any WARN/ERROR when the put operation was done.

rki_ · ‎08-19-2022

Hi @fsm17 , As you are using tez as an execution engine, I would suggest to set the below in hive which controls the number of mappers. tez.grouping.max-size(default 1073741824 which is 1GB) : The most data Tez will assign to a task. Decreasing this means more parallelism tez.grouping.min-size(default 52428800 which is 50MB) : The least data Tez will assign to a task. Increasing this means less parallelism

rki_ · ‎08-16-2022

@Ben1978 You can refer the below docs for spark 3,3.1,3.2 respectively https://docs.cloudera.com/cdp-private-cloud-base/7.1.4/cds-3/topics/spark-spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/cds-3/topics/spark-spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/cds-3/topics/spark-3-requirements.html As CDP 7.1.8 is not release yet, we don't have an official doc on that. -- Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

rki_ · ‎08-16-2022

Hi @noekmc , Are you referring to the below "Restart service" highlighted option that you see under ldap_url? If yes, it is expected and you can refer my earlier comment.

rki_ · ‎08-16-2022

Hi @Ben1978 Spark 3.3 will be a compatible with CDP 7.1.8 which is yet to be released.

rki_ · ‎08-12-2022

Hi @BORDIN , Are you able to copy any other files to hdfs using hdfs dfs -put? Does this happen all the time? This problem can be caused by the range of different issues that can cause the datanode block reports being delayed from reaching the namenode or the namenode being delayed when processing them. Can you check the Namenode log for the same time when put was done for any WARN/ERRORS. If the Namenode is busy, we can perform more retries and thereby more time for Namenode for the block write to complete. - Go to Cloudera Manager -> HDFS -> Configuration -> HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and add an entry like following: <property> <name>dfs.client.block.write.locateFollowingBlock.retries</name> <value>10</value> </property> - Save changes - Restart the stale services and deploy the client configuration.

rki_ · ‎08-10-2022

Hi @yagoaparecidoti , Yes, as you are using Ambari to manage the cluster, you can add the property as follows : Ambari -> HDFS -> Configs -> Advanced -> Custom hdfs-site -> Add Property dfs.client.block.write.replace-datanode-on-failure.policy=ALWAYS

rki_ · ‎08-10-2022

Hi @noekmc The alert saying "Requires Server Restart" for the config properties is expected to be there to help administrators to run the "systemctl restart cloudera-scm-server" on the CM host whenever they change the configs and even post changing the config and restarting the CM server. So, how are you validating that the config has not taken affect? To confirm this, you can simply login to the CM database and run a select query on the "configs" table to check if the properties are there. For eg, on my postgresDB, # sudo -u postgres psql postgres=# \c scm scm=# select attr, value, host_id, service_id from configs where attr like 'ldap%'; attr | value | host_id | service_id --------------+--------+---------+------------ ldap_type | LDAP | | ldap_bind_pw | abcdef | | ldap_bind_dn | robin | | (3 rows) So even after I have added some random values for LDAP in CM UI, the UI will show ""Requires Server Restart" for the already changed properties.

Online	Offline
Last Visited	‎12-17-2024 11:03 AM

Member Since	‎07-30-2020 02:04 AM
Last Visited	‎12-17-2024 11:03 AM
Posts	219
Kudos received	44

Cloudera Community

Re: Restore data from datanode after doing hdfs na...

Re: HBase "Master is initializing" error in pseudo...

Re: After upgrading Cloudera Manager to 7.11.3, Li...

Re: Can HDFS Rebalancer run without interrupted Pr...

Re: CM-HDFS

Re: Hive cli uses only a single container

Re: how to Modify HDFS Configuration according to ...

Re: when put file i found INFO hdfs.DFSClient: Exc...

Re: Hive cli uses only a single container

Re: Spark 3.3 support

Re: Cloudera Manager not apply the changes later r...

Re: Spark 3.3 support

Re: when put file i found INFO hdfs.DFSClient: Exc...

Re: how to configure fs.client.block.write.replace...

Re: Cloudera Manager not apply the changes later r...