About mbigelow

mbigelow · ‎02-17-2017

Kerberos and LDAP for Impala are different. MS AD provide both but configuring Hadoop and Impala for Kerberos uses that for auth and not LDAP. Based on what you said and the message from using the base command I'd say you are using Kerberos. Below is the command you need to use. impala-shell -k -i FQDN --ssl

mbigelow · ‎02-16-2017

I don't know specifically, but yes, it is most likely because the libraries used were not built for distributed system. For instance, if you had three executors running the code in the library then all three would be reading from the sftp side and directory all vying for the same files and copying them to the destination. It would be a mess.

bushnoh · ‎02-15-2017

Okay, I have it. I was using the parcel_provisioner.sh script to preload the parcels into Docker images. However, when doing the pre-extraction the permissions on the container-executor weren't being set properly. For now, turning off the preextracting works however I'll test by manually setting the perms however I'm wondering how many other permissions aren't set properly. FYI, the root:hadoop 400 permissions work because of the setuid flag on the container-executor binary. Now everything makes sense. Thanks for the help!

mbigelow · ‎02-14-2017

As for a rule for AD groups. If you set up LDAP for Hadoop this you should have set a base DN and user and group filters. This determines what is available from AD for Hadoop. The 'hdfs groups' command. It will return the groups identified for the current users. You can specify the username at the end to check a specific user. Warning: I and Cloudera do not recommend using Hadoop LDAP. It is better to integrate LDAP at the OS level using sssd, VAS/QAS, Centrify, etc.

guruprasadc11 · ‎02-14-2017

Thanks Tina, I will check with our support contact about licensing. Installation progressing fine.

syed101 · ‎02-14-2017

Hi I have uninstalled the cdh components of packages version and installed using parcels ...I am through.. I would like to ask a question in hbase. I have started hbase shell by using the command ./hbase shell. when I put the command list in the hbase prompt, I am getting following error: ERROR: Can't get master address from ZooKeeper; znode data == null Could you please help me to resolve this error. Thanks,

mbigelow · ‎02-13-2017

In HDFS, you tell it which disk to use and it will fill up those disk. There is the ability to set how much space on those disks are reserved for non-DFS data but it doesn't actual prevent the disk from being filled up. The issue at hand is that the smaller disk will fill up faster, so at some point they will not allow any more write operations and the cluster will have no way to balance itself out. This causes issue with HDFS replication and placement, along with hotspotting in MR, Spark, and any other jobs. Say for instance if you primarily operation on the last days worth of data for 80% of your jobs. At some point you will hit critical mass were those jobs, are running mostly on the same set of nodes. You could set the reserved non-DFS space to different values using Host Templates in CM. This would then at least give you a warning when you are approaching filling up the smaller disk, but then at that point the larger disk would have free space that isn't getting used. This is why it is strongly encourage to not have different hardware. If possible upgrade the smaller set. A possible option would be to use Heterogeneous storage. With it you can designate pools, so the larger nodes would be in one pool and the smaller in the other. Each ingestion point would need to set which pool it would use and you can set how many replicas go to each. This is a big architectural change those and should be carefully reviewed to see if it benefits your use case(s) in anyway. So, simply, use the same hardware or you will more than likely run into issues.

sodonnell · ‎02-09-2017

You can put the s3 credentials in the s3 URI, or you can just pass the parameters on the command line, which is what I prefer, eg: hadoop fs -Dfs.s3a.access.key="" -Dfs.s3a.secret.key="" -ls s3a://bucket-name/ Its also worth knowing that if you run the command like I have given above, it will override any other settings that are defined in the cluster config, such as core-site.xml etc.

RezDee · ‎02-06-2017

My instructor said that this is the jar file to use. but I will also use the other file you mentioned. Thank you for taking the time to answer my question.

cjervis · ‎02-06-2017

Hi @AnisurRehman, Did you check out the follow up posts from the series including: Ralph Kimball and Kaiser Permanente: Q&A Part I – Hadoop and the Data Warehouse Ralph Kimball and Kaiser Permanente: Q&A Part II – Building the Landing Zone

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Impala Shell switches on Kerberized Cluster wi...

Re: sftp transfer to hdfs in spark as opposed to u...

Re: Building kerberised cluster with Director: Yar...

Re: How to view the AD user groups that are availa...

Re: Cloudera Manager(5.10.0) Installation Stuck at...

Re: Installation of Hadoop using Cloudera manager ...

Re: Hadoop data nodes with different disk space

Re: Unable to access S3 bucket using S3A, HDFS CLI...

Re: testDFSIO

Re: how to create OLAP Cubes on HADOOP data - Imp...