About Shelton

Shelton · ‎04-06-2018

@Saravana V To change the block size, parameter, dfs.block.size can be changed to required value(default in hadoop 2.0 is 128mb 256mb in hdfs-site.xml file. Once this is changed through Ambari UI the ONLY recommended way, the cluster restart is required for the change to effect, for which will be applied only to the new files. Change this setting and restart all stale configurations see 256.JPG desktop --Created a directory for the test if it does not exist Copied a file specifying the block size to 256 MB see new_file256.JPG $ hdfs dfs -D dfs.blocksize=268435456 -put /tmp/ambari.properties.4 /user/sheltong/test Copied the new files to the same directory of files size 128 see new2_file256.JPG $ hdfs dfs -D dfs.blocksize=268435456 -put /tmp/ambari.properties.4 /user/sheltong DISTCP see (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. To overwrite the old 128MB files copy the new files with the new block size to the new location adding the option overwrite. However, we have to manually delete the old files with the older block size. Command: $ hadoop distcp -Ddfs.block.size=268435456 /path/to/data(Source) /path/to/data-with-largeblocks(Destination). Now the question becomes should I make my dataset 128 MB or 256 MB or even more? It all depends on your cluster capacity and the size of your datasets. Let's say you have a dataset which is 2 Petabytes in size. Having a 64 MB block size for this dataset will result in 31 million+ blocks which would put stress on the NameNode to manage all that blocks. Having a lot of blocks will also result in a lot of mappers during MapReduce execution. So, in this case, you may decide to increase the block size just for that dataset. Hope that helps

Shelton · ‎04-06-2018

@Saravana V You need to understand what is the benefit of having a larger block size. Your HDFS block is 128 MB will be written to disk sequentially. When you write the data sequentially there is a fair chance that the data will be written into contiguous space on disk which means that data will be written next to each other in a continuous fashion. When a data is laid out in the disk in continuous fashion it reduces the number of disk seeks during the read operation resulting in an efficient read. So that is why block size in HDFS is huge when compared to the other file systems. There is no effective way to change block size "in place". The concept of block size is tightly tied to the on-disk layout of block files at DataNodes, so it's non-trivial to change this. When you changing the block size from one value to other then only the files which are ingested/created in HDFS will be created with new block size. Where as the old files will remain to exists in the previous block size only and it will not changed. If you need to change then manual intervention is needed. Hope that helps

Shelton · ‎04-05-2018

@Vinit Mahiwal Here is the way to do it. set hive.execution.engine=spark; and a couple of other settings check this link it's what you need hive on spark

Shelton · ‎04-05-2018

@Juan Gonzalez Thank you were are there to help each other ...... So it's better you close this thread as it's now irrelevant and very long until you upgrade your memory. You can accept any of the responses if that helped you. Be assured HCC is a nice place to get help a lot of enthusiastic guys in here .

Shelton · ‎04-05-2018

@Dinesh Jadhav There were a couple of errors in the kdc.conf,krb5.conf and kadm5.acl. Please see the attached files. I would first ask you to backup your current kdc,krb5 and kadm5.acl files. I have tried to separate the config files with -------- Please let me know if you have well understood if you need clarifications don't hesitate

Shelton · ‎04-05-2018

@Juan Gonzalez It depends on your project implementation but it's always advisable to use the latest software in this case sandbox 2.5 or 2.6.x. Unfortunately, you won't get good or a lot of HCC support on very old sandbox releases. And please allocate 8 GB or more RAM to your sandbox personally I prototype with 14GB RAM. Yes, to run Ambari you will need at least 8GB of RAM otherwise you go the API way 🙂 but visual access is also good to get to understand ...

Shelton · ‎04-05-2018

@Juan Gonzalez Your memory is quite below the recommended size 8 GB, and why are you struggling and using an OLD sandbox 2.2.4.2? If you want to turn on maintenance mode for some components here is the link to the snippets https://community.hortonworks.com/articles/71199/how-to-manage-maintenance-mode-in-ambari-using-api.html

Shelton · ‎04-05-2018

@Anurag Mishra If your question got answered or resolved by that link please "Accept" and close this thread . Thank you

Shelton · ‎04-04-2018

@Juan Gonzalez Don't ignore ...Yes memory is very important, you MUST have a minimum of 8GB to run a sandbox the more you have the better and that the hortonworks recommendation Can you try these 2 curl commands to validate Check Ambari status curl -u admin:admin -G http://ambari_host:8080/api/v1/check Desired output <?xml version="1.0"?><status> RUNNING</status> Check HDFS status curl -u admin:admin -H "X-Requested-by:ambari" -i -k -X GET http://<AMBARI_HOST>:8080/api/v1/clusters/<CLUSTER_NAME>/services/HDFS/ Remember to replace the ambari_host and cluster_name with your actual value your cluster name should be sandbox I think

Shelton · ‎04-04-2018

@Juan Gonzalez Here is a link to solutions with issue pertaining to sandboxes Keep me posted

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Change the blocksize in existing cluster

Re: Change the blocksize in existing cluster

Re: Change Hive query execution engine to Spark

Re: I cant connect to ambari dashboard

Re: I got this Error : Invalid name provided (Mech...

Re: I cant connect to ambari dashboard

Re: I cant connect to ambari dashboard

Re: knox/Ldap integration

Re: I cant connect to ambari dashboard

Re: I cant connect to ambari dashboard