About Shelton

Shelton · ‎07-30-2020

@Seaport It shouldn't be strange to you that Hadoop doesn't perform well with small files, now with that in mind the best solution would be to zip all your small files locally and then copy the zipped file to hdfs using copyFromLocal there is one restriction that is the source of the files can only be on a local file system. I assume the local Linux box had is the edge node and had the hdfs client installed. If not you will have to copy the myzipped.gz to a node usually the edge node and perform the below steps $ hdfs dfs -copyFromLocal myzipped.gz /hadoop_path". Then unzip the myzipped.gz gzipped file in HDFS using $ hdfs dfs -cat /hadoop_path/myzipped.gz | gzip -d | hdfs dfs -put - /hadoop_path2 Hope that helps

Shelton · ‎07-29-2020

@Stephbat Bizarre all the symlinks should point to the newer version 3.1.4.0-315. You should recreate the symlink point to the new version. The re-run the steps

Shelton · ‎07-29-2020

@Stephbat Did you follow the Workaround:1 through 3? I see you have a similar error File "/var/lib/ambari-agent/cache/custom_actions/scripts/remove_previous_stacks.py 1) go to each ambari-agent node and edit the file remove_previous_stacks.py # vi /var/lib/ambari-agent/cache/custom_actions/scripts/remove_previous_stacks.py 2) go to line: 77 edit the line from : all_installed_packages = self.pkg_provider.all_installed_packages() to all_installed_packages = self.pkg_provider.installed_packages() 3) Retry the operation via curl again, please substitute only the Ambari_host below ex: curl 'http://Ambari_host:8080/api/v1/clusters/<cluster_name>/requests' -u admin:admin -H "X-Requested-By: ambari" -X POST -d'{"RequestInfo":{"context":"remove_previous_stacks", "action" : "remove_previous_stacks", "parameters" : {"version":"3.1.0.0-78"}}, "Requests/resource_filters": [{"hosts":"Ambari_host"}]}' And please revert

Shelton · ‎07-28-2020

@Stephbat This thread should help you achieve that the Ambari version could be an issue there is a workaround documented note the DISCLAIMER !! Please let me know

Shelton · ‎07-27-2020

@Krpyto84 Your permission issue is linked to ZK ACL's my good guess is your Kafka is kerberized. Zookeeper requires you to set up a superuser using the zookeeper.DigestAuthenticationProvider.superDigest property. I don't know how you will integrate that procedure in your Ansible playbook You will then need to append this in y your KAFKA_OPTS env variable to set the JVM parameters export KAFKA_OPTS=-Djava.security.auth.login.config=/path/to/kafka_server_jaas.conf Please let me know whether that is your situation if that's the case then I will try to help you out

Shelton · ‎07-26-2020

@mike_bronson7 log.retention.bytes is a size-based retention policy for logs, i.e the allowed size of the topic. Segments are pruned from the log as long as the remaining segments don't drop below log.retention.bytes. You can also specify retention parameters at the topic level To specify a retention time period per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.ms=[DesiredRetentionTimePeriod] To specify a retention log size per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.bytes=[DesiredRetentionLogSize] That should resolve your problem Happy hadooping

Shelton · ‎07-26-2020

@isanas I remember writing a long document on the steps either in HWX/Cloudera community. Ranger will automatically sync local users whose USERID is above 500 Please let me know if you what me to do a walkthrough, in that case, I will ask you to give me an accurate scenario and the version of HDP. Happy hadooping

Shelton · ‎07-23-2020

@focal_fossa Great to hear happy hadooping! Maybe to help other mark the best answer that helped you resolve your problem so other searching for similar solution would use it to resolve similar issues.

Shelton · ‎07-22-2020

@focal_fossa To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari. In HDP using Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure You will need to run the HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes Running the balancer without parameters: sudo -u hdfs hdfs balancer Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%. You can use a different threshold sudo -u hdfs hdfs balancer -threshold 5 This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage This process can take long depending on data in your cluster Hope that helps

Shelton · ‎07-21-2020

@Stephbat Those are internals to Cloudera and that confirms myth migration/upgrades are never smooth we still need humans 🙂 Please do those changes and let me know if your datanodes fires up correctly.

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Copy Files from Linux to HDFS - individually v...

Re: Upgrading HDP 3.1.0 to 3.1.4 : remove old vers...

Re: Upgrading HDP 3.1.0 to 3.1.4 : remove old vers...

Re: Upgrading HDP 3.1.0 to 3.1.4 : remove old vers...

Re: Zookeeper data_dir can't be created!!

Re: how to set permanent kafka retention bytes on ...

Re: hdfs crypto -createZone gives error RemoteExce...

Re: Increase size of HDFS.

Re: Increase size of HDFS.

Re: Upgrading HDP 3.1.0 to 3.1.4 : Cannot restart ...