About Shelton

Shelton · ‎07-28-2020

@Stephbat This thread should help you achieve that the Ambari version could be an issue there is a workaround documented note the DISCLAIMER !! Please let me know

Shelton · ‎07-27-2020

@Krpyto84 Your permission issue is linked to ZK ACL's my good guess is your Kafka is kerberized. Zookeeper requires you to set up a superuser using the zookeeper.DigestAuthenticationProvider.superDigest property. I don't know how you will integrate that procedure in your Ansible playbook You will then need to append this in y your KAFKA_OPTS env variable to set the JVM parameters export KAFKA_OPTS=-Djava.security.auth.login.config=/path/to/kafka_server_jaas.conf Please let me know whether that is your situation if that's the case then I will try to help you out

Shelton · ‎07-26-2020

@mike_bronson7 log.retention.bytes is a size-based retention policy for logs, i.e the allowed size of the topic. Segments are pruned from the log as long as the remaining segments don't drop below log.retention.bytes. You can also specify retention parameters at the topic level To specify a retention time period per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.ms=[DesiredRetentionTimePeriod] To specify a retention log size per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.bytes=[DesiredRetentionLogSize] That should resolve your problem Happy hadooping

Shelton · ‎07-26-2020

@isanas I remember writing a long document on the steps either in HWX/Cloudera community. Ranger will automatically sync local users whose USERID is above 500 Please let me know if you what me to do a walkthrough, in that case, I will ask you to give me an accurate scenario and the version of HDP. Happy hadooping

Shelton · ‎07-23-2020

@focal_fossa Great to hear happy hadooping! Maybe to help other mark the best answer that helped you resolve your problem so other searching for similar solution would use it to resolve similar issues.

Shelton · ‎07-22-2020

@focal_fossa To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari. In HDP using Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure You will need to run the HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes Running the balancer without parameters: sudo -u hdfs hdfs balancer Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%. You can use a different threshold sudo -u hdfs hdfs balancer -threshold 5 This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage This process can take long depending on data in your cluster Hope that helps

Shelton · ‎07-21-2020

@Stephbat Those are internals to Cloudera and that confirms myth migration/upgrades are never smooth we still need humans 🙂 Please do those changes and let me know if your datanodes fires up correctly.

Shelton · ‎07-21-2020

@Stephbat Please can you check those 2 values dfs.datanode.max.locked.memory and ulimit The dfs.datanode.max.locked.memory determines the maximum amount of memory a DataNode will use for caching. The "locked-in-memory size" corresponds to ulimit (ulimit -l) of the DataNode user that needs to be increased to match this parameter. The current dfs.datanode.max.locked.memory is 2 GB and while the RLIMIT_MEMLOCK is 16 MB If you get the error “Cannot start datanode because the configured max locked memory size… is more than the datanode’s available RLIMIT_MEMLOCK ulimit,” that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. Usually, this value is configured in /etc/security/limits.conf. However, it will vary depending on what operating system and distribution you are using please adjust the values accordingly remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache. Once adjust the datanode should start as a charm 🙂 Hope that helps

Shelton · ‎07-21-2020

@focal_fossa My guess is you are running out of memory. I would like to know how much memory you have? Copying local files to HDFS is done using the mapreduce job when we use put or copyFromLocal commands it is actually using Streaming by the hadoop client binary client libraries and queues. So my guess is that the Ambari views copy might also be using MR behind the scenes. Another alternative is to use DistCp [distributed copy] a tool used for large inter/intra-cluster copying. It also uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. DistCp at times runs out of memory for big datasets? If the number of individual files/directories being copied from the source path(s) is extremely large, DistCp might run out of memory while determining the list of paths for copy. This is not unique to the new DistCp implementation. To get around this, consider changing the -Xmx JVM heap-size parameters, as follows: $ export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m" $ hadoop distcp /source /target Hope that helps

Shelton · ‎07-21-2020

@saur Can you explain the latest development ? I compiled a document for you did you go through it step by step?

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Upgrading HDP 3.1.0 to 3.1.4 : remove old vers...

Re: Zookeeper data_dir can't be created!!

Re: how to set permanent kafka retention bytes on ...

Re: hdfs crypto -createZone gives error RemoteExce...

Re: Increase size of HDFS.

Re: Increase size of HDFS.

Re: Upgrading HDP 3.1.0 to 3.1.4 : Cannot restart ...

Re: Upgrading HDP 3.1.0 to 3.1.4 : Cannot restart ...

Re: Unable to upload file into HDFS

Re: Help required to connect to Ambari for HDP San...