Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 866 | 06-04-2025 11:36 PM | |
| 1438 | 03-23-2025 05:23 AM | |
| 720 | 03-17-2025 10:18 AM | |
| 2588 | 03-05-2025 01:34 PM | |
| 1715 | 03-03-2025 01:09 PM |
04-06-2018
10:54 PM
2 Kudos
@Saravana V To change the block size, parameter, dfs.block.size can be changed to required value(default in hadoop 2.0 is 128mb 256mb in hdfs-site.xml file. Once this is changed through Ambari UI the ONLY recommended way, the cluster restart is required for the change to effect, for which will be applied only to the new files. Change this setting and restart all stale configurations see 256.JPG desktop --Created a directory for the test if it does not exist Copied a file specifying the block size to 256 MB see new_file256.JPG $ hdfs dfs -D dfs.blocksize=268435456 -put /tmp/ambari.properties.4 /user/sheltong/test Copied the new files to the same directory of files size 128 see new2_file256.JPG $ hdfs dfs -D dfs.blocksize=268435456 -put /tmp/ambari.properties.4 /user/sheltong DISTCP see (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. To overwrite the old 128MB files copy the new files with the new block size to the new location adding the option overwrite. However, we have to manually delete the old files with the older block size. Command: $ hadoop distcp -Ddfs.block.size=268435456 /path/to/data(Source) /path/to/data-with-largeblocks(Destination). Now the question becomes should I make my dataset 128 MB or 256 MB or even more? It all depends on your cluster capacity and the size of your datasets. Let's say you have a dataset which is 2 Petabytes in size. Having a 64 MB block size for this dataset will result in 31 million+ blocks which would put stress on the NameNode to manage all that blocks. Having a lot of blocks will also result in a lot of mappers during MapReduce execution. So, in this case, you may decide to increase the block size just for that dataset. Hope that helps
... View more
04-06-2018
07:52 PM
1 Kudo
@Saravana V You need to understand what is the benefit of having a larger block size. Your HDFS block is 128 MB will be written to disk sequentially. When you write the data sequentially there is a fair chance that the data will be written into contiguous space on disk which means that data will be written next to each other in a continuous fashion. When a data is laid out in the disk in continuous fashion it reduces the number of disk seeks during the read operation resulting in an efficient read. So that is why block size in HDFS is huge when compared to the other file systems. There is no effective way to change block size "in place". The concept of block size is tightly tied to the on-disk layout of block files at DataNodes, so it's non-trivial to change this. When you changing the block size from one value to other then only the files which are ingested/created in HDFS will be created with new block size. Where as the old files will remain to exists in the previous block size only and it will not changed. If you need to change then manual intervention is needed. Hope that helps
... View more
04-05-2018
12:10 PM
@Vinit Mahiwal Here is the way to do it. set hive.execution.engine=spark; and a couple of other settings check this link it's what you need hive on spark
... View more
04-05-2018
11:22 AM
@Juan Gonzalez Thank you were are there to help each other ...... So it's better you close this thread as it's now irrelevant and very long until you upgrade your memory. You can accept any of the responses if that helped you. Be assured HCC is a nice place to get help a lot of enthusiastic guys in here .
... View more
04-05-2018
11:18 AM
@Dinesh Jadhav There were a couple of errors in the kdc.conf,krb5.conf and kadm5.acl. Please see the attached files. I would first ask you to backup your current kdc,krb5 and kadm5.acl files. I have tried to separate the config files with -------- Please let me know if you have well understood if you need clarifications don't hesitate
... View more
04-05-2018
10:43 AM
@Juan Gonzalez It depends on your project implementation but it's always advisable to use the latest software in this case sandbox 2.5 or 2.6.x. Unfortunately, you won't get good or a lot of HCC support on very old sandbox releases. And please allocate 8 GB or more RAM to your sandbox personally I prototype with 14GB RAM. Yes, to run Ambari you will need at least 8GB of RAM otherwise you go the API way 🙂 but visual access is also good to get to understand ...
... View more
04-05-2018
10:16 AM
@Juan Gonzalez Your memory is quite below the recommended size 8 GB, and why are you struggling and using an OLD sandbox 2.2.4.2? If you want to turn on maintenance mode for some components here is the link to the snippets https://community.hortonworks.com/articles/71199/how-to-manage-maintenance-mode-in-ambari-using-api.html
... View more
04-05-2018
07:26 AM
@Anurag Mishra If your question got answered or resolved by that link please "Accept" and close this thread . Thank you
... View more
04-04-2018
10:35 PM
@Juan Gonzalez Don't ignore ...Yes memory is very important, you MUST have a minimum of 8GB to run a sandbox the more you have the better and that the hortonworks recommendation Can you try these 2 curl commands to validate Check Ambari status curl -u admin:admin -G http://ambari_host:8080/api/v1/check Desired output <?xml version="1.0"?><status> RUNNING</status> Check HDFS status curl -u admin:admin -H "X-Requested-by:ambari" -i -k -X GET http://<AMBARI_HOST>:8080/api/v1/clusters/<CLUSTER_NAME>/services/HDFS/ Remember to replace the ambari_host and cluster_name with your actual value your cluster name should be sandbox I think
... View more
04-04-2018
08:49 PM
@Juan Gonzalez Here is a link to solutions with issue pertaining to sandboxes Keep me posted
... View more