About mbigelow

mbigelow · ‎02-17-2017

You have HTTP Authentication turned on for HDFS. CM > HDFS > Configuration > Enable Kerberos Authentication for HTTP Web-Consoles. You can turn it off if you don't require it. Or you can follow the below link to configure your browser to auth for the site. https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_sg_browser_access_kerberos_protected_url.html

mbigelow · ‎02-16-2017

No rules applied to 25761081@PROD.GLOBAL.COM You do not have any auth-to-local rules set up for this format. You should also have the realm in the Trusted Realm list as mentioned by @csguna. https://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_auth_to_local_isolate.html

mbigelow · ‎02-16-2017

I don't know specifically, but yes, it is most likely because the libraries used were not built for distributed system. For instance, if you had three executors running the code in the library then all three would be reading from the sftp side and directory all vying for the same files and copying them to the destination. It would be a mess.

mbigelow · ‎02-14-2017

Can you link the guide you are following. Admission Control and dynamic resource pools are seperate but can function together. Unless I am mistaken, MEM_LIMIT is the memory limit for each Impala Daemon. Ok, just did a quick read it can be used at run time per query. It will set the limit for a specific query. Is this were you are setting it? This has nothing to do with Admission Control or DRP. For DRP you can set default_pool_mem_limit to cap how much memory can be used in a pool used by specific users/groups/queries.

mbigelow · ‎02-14-2017

I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past. SPARK_PRINT_LAUNCH_COMMAND=true spark-shell SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ... This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions. http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spark-shell-and-spark

mbigelow · ‎02-14-2017

On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster. In which mode did you launch the job? Are you using the SQLContext or HiveContext? Did you set these setting in the HiveContext if used? SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048 ET hive.exec.dynamic.partition.mode=non-strict;

mbigelow · ‎02-14-2017

The permissions on container-executor.cfg is correct. It should be 400 and root:hadoop. Find and check the actual binary, container-executor. Also, review all of the configs. As a secured cluster switches from the default to the LinuxContainerExecutor. https://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/SecureMode.html#LinuxContainerExecutor

mbigelow · ‎02-14-2017

As for a rule for AD groups. If you set up LDAP for Hadoop this you should have set a base DN and user and group filters. This determines what is available from AD for Hadoop. The 'hdfs groups' command. It will return the groups identified for the current users. You can specify the username at the end to check a specific user. Warning: I and Cloudera do not recommend using Hadoop LDAP. It is better to integrate LDAP at the OS level using sssd, VAS/QAS, Centrify, etc.

mbigelow · ‎02-14-2017

Disclosure: Never done this. I read the readme of that project. If that is what you want to do that would be a way to do it. The note at the bottom spells out the restriction though and it follows what I was thinking. It says that it doesn't run in a spark job but a SparkContext is created and used; so it must. This means that while it runs in a driver or executor it only works in local mode and will only run on the node you launch it from. For me this removes any benefits of using Spark for this piece of the workflow. It would be better to use Flume or some other ingestion tool. But yes you could use this project or write your own java, scala app to read sftp and write to HDFS. SFTP files are fetched and written using jsch. It is not executed as spark job. It might have issues in cluster

mbigelow · ‎02-13-2017

Last one for the night, I swear, I see that there is an 'insert code' option but I don't see it used often enough. Is there a way to remind the user prior to posting that they could use the feature? Or on the users first post give a tooltip? Along the lines of helping post be more readable and constructive, the guidelines state to provide as much context as possible but I think there should be some way to encourage or discourage not providing context. Too often not enough information is provided and that is the first response. Sometimes the OP does not know where to find it and maybe we need more KBs on how to do that. I think items like CDH version and Service could be made mandatory depending on the topic/tag assigned.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: NameNode URL not accesible on Kerberos Cluster

Re: java.lang.IllegalArgumentException: Illegal pr...

Re: sftp transfer to hdfs in spark as opposed to u...

Re: How to set MEM_LIMIT based on explain plan

Re: Failing to save dataframe to

Re: Failing to save dataframe to

Re: Building kerberised cluster with Director: Yar...

Re: How to view the AD user groups that are availa...

Re: sftp transfer to hdfs in spark as opposed to u...

Re: How can we better serve you with this communit...