Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 983 | 06-04-2025 11:36 PM | |
| 1562 | 03-23-2025 05:23 AM | |
| 777 | 03-17-2025 10:18 AM | |
| 2797 | 03-05-2025 01:34 PM | |
| 1846 | 03-03-2025 01:09 PM |
07-30-2020
01:41 PM
@Seaport It shouldn't be strange to you that Hadoop doesn't perform well with small files, now with that in mind the best solution would be to zip all your small files locally and then copy the zipped file to hdfs using copyFromLocal there is one restriction that is the source of the files can only be on a local file system. I assume the local Linux box had is the edge node and had the hdfs client installed. If not you will have to copy the myzipped.gz to a node usually the edge node and perform the below steps $ hdfs dfs -copyFromLocal myzipped.gz /hadoop_path". Then unzip the myzipped.gz gzipped file in HDFS using $ hdfs dfs -cat /hadoop_path/myzipped.gz | gzip -d | hdfs dfs -put - /hadoop_path2 Hope that helps
... View more
07-29-2020
03:25 PM
1 Kudo
@Stephbat Bizarre all the symlinks should point to the newer version 3.1.4.0-315. You should recreate the symlink point to the new version. The re-run the steps
... View more
07-29-2020
06:45 AM
1 Kudo
@Stephbat Did you follow the Workaround:1 through 3? I see you have a similar error File "/var/lib/ambari-agent/cache/custom_actions/scripts/remove_previous_stacks.py 1) go to each ambari-agent node and edit the file remove_previous_stacks.py # vi /var/lib/ambari-agent/cache/custom_actions/scripts/remove_previous_stacks.py 2) go to line: 77 edit the line from : all_installed_packages = self.pkg_provider.all_installed_packages() to all_installed_packages = self.pkg_provider.installed_packages() 3) Retry the operation via curl again, please substitute only the Ambari_host below ex: curl 'http://Ambari_host:8080/api/v1/clusters/<cluster_name>/requests' -u admin:admin -H "X-Requested-By: ambari" -X POST -d'{"RequestInfo":{"context":"remove_previous_stacks", "action" : "remove_previous_stacks", "parameters" : {"version":"3.1.0.0-78"}}, "Requests/resource_filters": [{"hosts":"Ambari_host"}]}' And please revert
... View more
07-28-2020
02:47 PM
1 Kudo
@Stephbat This thread should help you achieve that the Ambari version could be an issue there is a workaround documented note the DISCLAIMER !! Please let me know
... View more
07-27-2020
06:08 AM
@Krpyto84 Your permission issue is linked to ZK ACL's my good guess is your Kafka is kerberized. Zookeeper requires you to set up a superuser using the zookeeper.DigestAuthenticationProvider.superDigest property. I don't know how you will integrate that procedure in your Ansible playbook You will then need to append this in y your KAFKA_OPTS env variable to set the JVM parameters export KAFKA_OPTS=-Djava.security.auth.login.config=/path/to/kafka_server_jaas.conf Please let me know whether that is your situation if that's the case then I will try to help you out
... View more
07-26-2020
11:18 AM
1 Kudo
@mike_bronson7 log.retention.bytes is a size-based retention policy for logs, i.e the allowed size of the topic. Segments are pruned from the log as long as the remaining segments don't drop below log.retention.bytes. You can also specify retention parameters at the topic level To specify a retention time period per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.ms=[DesiredRetentionTimePeriod] To specify a retention log size per topic, use the following command. kafka-configs.sh --zookeeper [ZooKeeperConnectionString] --alter --entity-type topics --entity-name [TopicName] --add-config retention.bytes=[DesiredRetentionLogSize] That should resolve your problem Happy hadooping
... View more
07-26-2020
07:09 AM
@isanas I remember writing a long document on the steps either in HWX/Cloudera community. Ranger will automatically sync local users whose USERID is above 500 Please let me know if you what me to do a walkthrough, in that case, I will ask you to give me an accurate scenario and the version of HDP. Happy hadooping
... View more
07-23-2020
02:16 AM
@focal_fossa Great to hear happy hadooping! Maybe to help other mark the best answer that helped you resolve your problem so other searching for similar solution would use it to resolve similar issues.
... View more
07-22-2020
09:37 AM
1 Kudo
@focal_fossa To increase the HDFS capacity add capacity by giving dfs.datanode.data.dir more mount points or directories the new disk need to be mounted/formatted prior to adding the mount point in Ambari. In HDP using Ambari, you should add the new mount point to the list of dirs in the dfs.datanote.data.dir property. Depending the version of Ambari or in advanced section, the property is in hdfs-site.xml. the more new disk you provide through comma separated list the more capacity you will have. Preferably every machine should have same disk and mount point structure You will need to run the HDFS balancer re-balances data across the DataNodes, moving blocks from overutilized to underutilized nodes Running the balancer without parameters: sudo -u hdfs hdfs balancer Running the balancer with a default threshold of 10%, meaning that the script will ensure that disk usage on each DataNode differs from the overall usage in the cluster by no more than 10%. You can use a different threshold sudo -u hdfs hdfs balancer -threshold 5 This specifies that each Datanode's disk usage must be (or will be adjusted to be) within 5% of the cluster's overall usage This process can take long depending on data in your cluster Hope that helps
... View more
07-21-2020
07:55 AM
@Stephbat Those are internals to Cloudera and that confirms myth migration/upgrades are never smooth we still need humans 🙂 Please do those changes and let me know if your datanodes fires up correctly.
... View more