Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3978 | 10-13-2017 09:42 PM | |
| 7477 | 09-14-2017 11:15 AM | |
| 3799 | 09-13-2017 10:35 PM | |
| 6041 | 09-13-2017 10:25 PM | |
| 6604 | 09-13-2017 10:05 PM |
02-17-2017
12:00 AM
1 Kudo
You have HTTP Authentication turned on for HDFS. CM > HDFS > Configuration > Enable Kerberos Authentication for HTTP Web-Consoles. You can turn it off if you don't require it. Or you can follow the below link to configure your browser to auth for the site. https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_sg_browser_access_kerberos_protected_url.html
... View more
02-16-2017
11:47 PM
No rules applied to 25761081@PROD.GLOBAL.COM You do not have any auth-to-local rules set up for this format. You should also have the realm in the Trusted Realm list as mentioned by @csguna. https://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_auth_to_local_isolate.html
... View more
02-16-2017
12:38 PM
I don't know specifically, but yes, it is most likely because the libraries used were not built for distributed system. For instance, if you had three executors running the code in the library then all three would be reading from the sftp side and directory all vying for the same files and copying them to the destination. It would be a mess.
... View more
02-14-2017
05:05 PM
Can you link the guide you are following. Admission Control and dynamic resource pools are seperate but can function together. Unless I am mistaken, MEM_LIMIT is the memory limit for each Impala Daemon. Ok, just did a quick read it can be used at run time per query. It will set the limit for a specific query. Is this were you are setting it? This has nothing to do with Admission Control or DRP. For DRP you can set default_pool_mem_limit to cap how much memory can be used in a pool used by specific users/groups/queries.
... View more
02-14-2017
02:44 PM
1 Kudo
I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past. SPARK_PRINT_LAUNCH_COMMAND=true spark-shell SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ... This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions. http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spark-shell-and-spark
... View more
02-14-2017
02:29 PM
On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster. In which mode did you launch the job? Are you using the SQLContext or HiveContext? Did you set these setting in the HiveContext if used? SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048 ET hive.exec.dynamic.partition.mode=non-strict;
... View more
02-14-2017
01:29 PM
The permissions on container-executor.cfg is correct. It should be 400 and root:hadoop. Find and check the actual binary, container-executor. Also, review all of the configs. As a secured cluster switches from the default to the LinuxContainerExecutor. https://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/SecureMode.html#LinuxContainerExecutor
... View more
02-14-2017
01:23 PM
As for a rule for AD groups. If you set up LDAP for Hadoop this you should have set a base DN and user and group filters. This determines what is available from AD for Hadoop. The 'hdfs groups' command. It will return the groups identified for the current users. You can specify the username at the end to check a specific user. Warning: I and Cloudera do not recommend using Hadoop LDAP. It is better to integrate LDAP at the OS level using sssd, VAS/QAS, Centrify, etc.
... View more
02-14-2017
01:15 PM
1 Kudo
Disclosure: Never done this. I read the readme of that project. If that is what you want to do that would be a way to do it. The note at the bottom spells out the restriction though and it follows what I was thinking. It says that it doesn't run in a spark job but a SparkContext is created and used; so it must. This means that while it runs in a driver or executor it only works in local mode and will only run on the node you launch it from. For me this removes any benefits of using Spark for this piece of the workflow. It would be better to use Flume or some other ingestion tool. But yes you could use this project or write your own java, scala app to read sftp and write to HDFS. SFTP files are fetched and written using jsch. It is not executed as spark job. It might have issues in cluster
... View more
02-13-2017
11:13 PM
Last one for the night, I swear, I see that there is an 'insert code' option but I don't see it used often enough. Is there a way to remind the user prior to posting that they could use the feature? Or on the users first post give a tooltip? Along the lines of helping post be more readable and constructive, the guidelines state to provide as much context as possible but I think there should be some way to encourage or discourage not providing context. Too often not enough information is provided and that is the first response. Sometimes the OP does not know where to find it and maybe we need more KBs on how to do that. I think items like CDH version and Service could be made mandatory depending on the topic/tag assigned.
... View more