Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 493 | 06-04-2025 11:36 PM | |
| 1037 | 03-23-2025 05:23 AM | |
| 539 | 03-17-2025 10:18 AM | |
| 2029 | 03-05-2025 01:34 PM | |
| 1265 | 03-03-2025 01:09 PM |
07-28-2020
02:47 PM
1 Kudo
@Stephbat This thread should help you achieve that the Ambari version could be an issue there is a workaround documented note the DISCLAIMER !! Please let me know
... View more
07-27-2020
06:08 AM
@Krpyto84 Your permission issue is linked to ZK ACL's my good guess is your Kafka is kerberized. Zookeeper requires you to set up a superuser using the zookeeper.DigestAuthenticationProvider.superDigest property. I don't know how you will integrate that procedure in your Ansible playbook You will then need to append this in y your KAFKA_OPTS env variable to set the JVM parameters export KAFKA_OPTS=-Djava.security.auth.login.config=/path/to/kafka_server_jaas.conf Please let me know whether that is your situation if that's the case then I will try to help you out
... View more
07-26-2020
11:18 AM
1 Kudo
@mike_bronson7 log.retention.bytes is a size-based retention policy for logs, i.e the allowed size of the topic. Segments are pruned from the log as long as the remaining segments don't drop below log.retention.bytes. You can also specify retention parameters at the topic level To specify a retention time period per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.ms=[DesiredRetentionTimePeriod] To specify a retention log size per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.bytes=[DesiredRetentionLogSize] That should resolve your problem Happy hadooping
... View more
07-26-2020
07:09 AM
@isanas I remember writing a long document on the steps either in HWX/Cloudera community. Ranger will automatically sync local users whose USERID is above 500 Please let me know if you what me to do a walkthrough, in that case, I will ask you to give me an accurate scenario and the version of HDP. Happy hadooping
... View more
07-23-2020
02:16 AM
@focal_fossa Great to hear happy hadooping! Maybe to help other mark the best answer that helped you resolve your problem so other searching for similar solution would use it to resolve similar issues.
... View more
07-22-2020
09:37 AM
1 Kudo
@focal_fossa To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari. In HDP using Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure You will need to run the HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes Running the balancer without parameters: sudo -u hdfs hdfs balancer Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%. You can use a different threshold sudo -u hdfs hdfs balancer -threshold 5 This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage This process can take long depending on data in your cluster Hope that helps
... View more
07-21-2020
07:55 AM
@Stephbat Those are internals to Cloudera and that confirms myth migration/upgrades are never smooth we still need humans 🙂 Please do those changes and let me know if your datanodes fires up correctly.
... View more
07-21-2020
06:50 AM
@Stephbat Please can you check those 2 values dfs.datanode.max.locked.memory and ulimit The dfs.datanode.max.locked.memory determines the maximum amount of memory a DataNode will use for caching. The "locked-in-memory size" corresponds to ulimit (ulimit -l) of the DataNode user that needs to be increased to match this parameter. The current dfs.datanode.max.locked.memory is 2 GB and while the RLIMIT_MEMLOCK is 16 MB If you get the error “Cannot start datanode because the configured max locked memory size… is more than the datanode’s available RLIMIT_MEMLOCK ulimit,” that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. Usually, this value is configured in /etc/security/limits.conf. However, it will vary depending on what operating system and distribution you are using please adjust the values accordingly remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache. Once adjust the datanode should start as a charm 🙂 Hope that helps
... View more
07-21-2020
04:56 AM
1 Kudo
@focal_fossa My guess is you are running out of memory. I would like to know how much memory you have? Copying local files to HDFS is done using the mapreduce job when we use put or copyFromLocal commands it is actually using Streaming by the hadoop client binary client libraries and queues. So my guess is that the Ambari views copy might also be using MR behind the scenes. Another alternative is to use DistCp [distributed copy] a tool used for large inter/intra-cluster copying. It also uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. DistCp at times runs out of memory for big datasets? If the number of individual files/directories being copied from the source path(s) is extremely large, DistCp might run out of memory while determining the list of paths for copy. This is not unique to the new DistCp implementation. To get around this, consider changing the -Xmx JVM heap-size parameters, as follows: $ export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m"
$ hadoop distcp /source /target Hope that helps
... View more
07-21-2020
03:52 AM
@saur Can you explain the latest development ? I compiled a document for you did you go through it step by step?
... View more