About GautamG

GautamG · ‎09-20-2014

The error message here might hold the key. Can you verify why it might not be executable? Did you change permissions at some point? /opt/cloudera-manager/cm-5.1.2/lib64/cmf/service/common/cloudera-config.sh: line 172: /pkg/moip/mo10755/work/mzpl/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/meta/cdh_env.sh: Permission denied

GautamG · ‎09-20-2014

Are you able to provide us the logs from the ZooKeeper instance (/var/log/zookeeper)? It should tell us why it's not starting. Please paste the logs into pastebin and provide the URL here. You just need to provide the section covering the last startup attempt and the failure

GautamG · ‎09-18-2014

Thank you for the update, glad you were able to resolve the problem.

GautamG · ‎09-18-2014

Since you're using VMWare, you could very well do what QuickStart basically is. - Create one VM and configure all services as you wish. - Take a snapshot or export the appliance - Clone it ten times for ten virtual machines When you want to update CDH just update the master image and repeat the process.

GautamG · ‎09-18-2014

I am not aware of how you can get the quickstart VM to work with ESXi. Is there anything specific you need from the quickstart VM? Why not create a blank VM with CentOS6.4 and install CDH+CM from scratch? The whole install process is pretty easy. http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/Cloudera-Manager-Installation-Guide.html If you do happen to try this and run into any issues, please start a new thread and we'll be happy to assist.

GautamG · ‎09-17-2014

Are you able to try the VMWare image with the Player product and let us know if it works for you? http://www.vmware.com/products/player http://www.cloudera.com/content/support/en/downloads/quickstart_vms/cdh-5-1-x1.html

GautamG · ‎09-15-2014

The advantage of using HAR files is not in saving of disk space but in lesser metadata. Please read the blog link I pasted earlier. quote: === A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files. Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb . So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible. Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed for streaming access of large files. Reading through small files normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small file, all of which is an inefficient data access pattern. ===

GautamG · ‎09-15-2014

If you use HAR to combine 8 smaller files (each less than 1M), it would occupy just one block. More than disk space saved, you save on metadata storage (on the namenode and datanodes) and this is far more significant in the long term for performance.

GautamG · ‎09-15-2014

The block on the file system isn't a fixed size file with padding, rather it is just a unit of storage. The block's size can be maximum of 128MB (or as configured), so if a file is smaller, it will just occupy the minimum needed space. In my previous response, I had said 8 small files would take up 3GB of space. This is incorrect. The space taken up on the cluster is still just the file size times 3 for each block. Regardless of file size, you can divide the size by the block size (default 128M) and round up to the next whole number, this will give you the number of blocks. So in this case, the 3922 byte file uses one block to store the contents.

GautamG · ‎09-15-2014

> The HDFS block size in my system is set to be 128m. Does it mean that > if I put 8 files less than 128m to HDFS, they would occupy 3G disk > space (replication factor = 3) ? Yes, this is right. HDFS blocks are not shared among files. > How could I know the actual occupied space of HDFS file ? The -ls command tells you this. In the example below, the jar file is 3922 bytes long. # sudo -u hdfs hadoop fs -ls /user/oozie/share/lib/sqoop/hive-builtins-0.10.0-cdh4.7.0.jar -rw-r--r-- 3 oozie oozie 3922 2014-09-14 06:17 /user/oozie/share/lib/sqoop/hive-builtins-0.10.0-cdh4.7.0.jar > And how about I use HAR to archive these 8 files ? Can it save some > space ? Using HAR is a good idea. More ideas about dealing with the small files problem is in this link http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

Online	Offline
Last Visited	‎08-03-2023 05:25 PM

Member Since	‎01-20-2014 12:14 AM
Last Visited	‎08-03-2023 05:25 PM
Posts	578
Kudos received	102

Cloudera Community

Re: BDR Throwing Error : Hive Table does not match...

Re: parcel usages (Active, 0)

Re: Upgrading to CentOS 6.7... what version of CDH...

Re: 1 of the 3 node Zookeeper quorum failed, how t...

Re: Parcel distro suffixes

Re: Zookeeper installation error

Re: Zookeeper installation error

Re: Cloudera Quickstart VM fails to start on VMwar...

Re: Cloudera Quickstart VM fails to start on VMwar...

Re: Cloudera Quickstart VM fails to start on VMwar...

Re: Cloudera Quickstart VM fails to start on VMwar...

Re: small files problem

Re: small files problem

Re: small files problem

Re: small files problem