Member since
01-18-2016
164
Posts
32
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
137 | 01-14-2025 06:30 PM | |
1420 | 04-06-2018 09:24 PM | |
1451 | 05-02-2017 10:43 PM | |
3958 | 01-24-2017 08:21 PM | |
24177 | 12-05-2016 10:35 PM |
07-20-2016
03:53 PM
How can we get pyspark to submit yarn jobs as the end user? We have data in a private directory (700) that a user owns. He can select data with HiveServer2's beeline, but when using pyspark, he gets permission denied because the job is submitted as the "spark" user instead of as the end-user. This is a kerberized cluster with Ranger Hive and HDFS plugins. He has access to the directory in question, just not with pyspark. He is mostly using Jupyter via Jupyterhub, which is using PAM authentication, but I think he has also run this with bin/pyspark with the same results. Here is the code: from pyspark import SparkContext, SparkConf
SparkContext.setSystemProperty('spark.executor.memory', '2g')
conf = SparkConf()
conf.set('spark.executor.instances', 4)
sc = SparkContext('yarn-client', 'myapp', conf=conf)
rdd = sc.textFile('/user/johndoe/.staging/test/student.txt')
rdd.cache()
rdd.count() And the error: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=EXECUTE, inode="/user/johndoe/.staging/test/student.txt":johndoe:hdfs:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:305)
... View more
Labels:
- Labels:
-
Apache Spark
07-20-2016
01:06 PM
1 Kudo
I may have been wrong about adding "/solr" to your zookeepers. I know I had to do that somewhere, but I guess it was when starting solr from the commandline without the "bin/solr start" command. So, you can re-upload your config directory to a configset named "lab". It will create or overwrite the current configset (which is just a ZK directory of your conf directory). The default configset name is the same as your collection. T ./zkcli.sh -zkhost localhost:2181 -cmd upconfig -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -confname lab If your configset is called testcoll, then do this to show the contents of the solrconfig.xml in zookeeper: ./zkcli.sh -zkhost localhost:2181 -cmd get /configs/lab/solrconfig.xml I recommend running the list command which will dump everything in zookeeper, not just listing files but will print the contents of the files. That's a bit too much, so just pipe it to "less" and then search for your collection name as you would with vi (with / and ? to search). Then you'll see the path to your configs. ./zkcli.sh -zkhost localhost:2181 -cmd list |less you will see something like this (my collection is called testcoll in this example): /configs/testcoll/solrconfig.xml (0)
DATA: ...supressed...
/configs/testcoll/lang (38)
/configs/testcoll/lang/contractions_ga.txt (0)
DATA: ...supressed...
/configs/testcoll/lang/stopwords_hi.txt (0)
DATA: ...supressed...
/configs/testcoll/lang/stopwords_eu.txt (0)
DATA: ...supressed...
/configs/testcoll/lang/stopwords_sv.txt (0)
DATA: ...supressed...
/configs/testcoll/lang/contractions_it.txt (0) I hope that helps.
... View more
07-20-2016
12:59 PM
This will upload your config directory to a configset named "testcoll". The default configset name is the same as your collection ./zkcli.sh -zkhost localhost:2181 -cmd upconfig -confdir ../../solr/configsets/data_driven_schema_configs/conf -confname testcoll. If your configset is called testcoll, then do this to show the contents of the solrconfig.xml in zookeeper: ./zkcli.sh -zkhost localhost:2181 -cmd get /configs/testcoll/solrconfig.xml I recommend running the list command which will dump everything in zookeeper, not just listing files but will print the contents of the files. That's a bit too much, so just pipe it to "less" and then search for your collection name as you would with vi (with / and ? to search). Then you'll see the path to your configs.
... View more
07-20-2016
11:24 AM
@Saurabh Kumar You need to add "/solr" to the end of your zookeeper host:port like this (you probably only need to list one of the zookeepers for the command) ./zkcli.sh -cmd upconfig -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -confname labs -z m1.hdp22:2181/solr That command will upload the conf directory. I'd also suggest trying the list command ("-cmd list") to see what's in zookeeper. It has been a ehwhile since I have used it and I can't try it at the moment.
... View more
07-20-2016
03:46 AM
You have HDFS defined in two places: in the command line and also in the solrconfig.xml. I don't understand the one in the command line since it does not include a port and that does not look like a hostname, but it could be: HDPTSTHA. You might try temporarily changing the one in in your solrconfig.xml to something bogus to see if it affects your reported error. Also, the create command says, "Re-using existing configuration directory labs", which makes me wonder if it is reusing what's already in zookeeper and perhaps that file does not match the one on your OS FS. The error reported has only one slash after "hdfs:/". Use Solr's zkcli.sh tool (which is different from the one that comes with Zookeeper) to get the contents of what's there or you could do a getfile or upconfig (to replace/update zookeeper). Remember that Solr adds "/solr" to the root except in embedded ZK mode.
... View more
07-13-2016
12:11 PM
@Saurabh Kumar - Since this example is using the Sandbox zookeeper rather than embedded zk, try adding /solr to the end of your zookeeper entry in the create command like this sandbox.hortonworks.com:2181/solr. I'm not sure if that will solve the issue but when Solr runs in cloud mode with an external zookeeper it makes /solr the zookeeper root to keep it's data separate from other services.
... View more
07-12-2016
06:39 PM
It's telling you that it cannot find the file in your confdir (the directory after -d in your command). Do an ls on the directory and either the directory is missing or the solrconfig.xml is missing from /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf. You may have overlooked a step in the setup (Step 2). It includes creating the conf directory and modifying the solrconfig.xml file... cp -R /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs
... View more
06-30-2016
04:14 PM
Thanks for the info @vpoornalingam. We resolved the issue. It turns out that when we changed the path, Ambari suggested making other changes, which we mindlessly accepted. Obviously we should have paid more attention. One of the changes was to drop the embedded hbase master heap size to a much lower value. I realized this after looking at the log hbase-ams-master-<FQDN>.out rather than ambari-metrics-collector.log that I was looking at.
... View more
06-29-2016
10:53 PM
Changing it broke Metrics Collector... I need to move the AMS hbase.rootdir to another partition, so I created a directory, did a chown -R ams:hadoop MYDIR, changed the configuration value and restarted AMS. The Metrics Collector will not start. It's throwing a connection refused exception when trying to connect to zookeeper on localhost:61181. Unfortunately, I don't have access to the exact exception at the moment). This is on HDP 2.4 Nothing is listening on port 61181, which I believe should be the embedded ZK port. hbase.zookeeper.property.clientPort={{zookeeper_clientPort}}. I killed all ams processes to be sure something was not in a bad state. I also tried copying the old hbase.rootdir to my new directory with the same permissions but it still fails. When I switch back to the old location it works fine. This seems very similar to changing to distributed mode, so I don't understand what's going wrong.
... View more
Labels:
- Labels:
-
Apache Ambari
06-29-2016
05:12 PM
Awesome. Thanks. I wasn't sure if under the covers Ranger was just doing sql grants.
... View more