Member since
03-17-2017
11
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9581 | 04-19-2017 10:41 AM |
04-25-2017
08:36 AM
It is an internal table. The creation process was using the HUE GUI to 'Create a new table manually' in the Metastore Manager for the Hive default database. I didn't choose the 'Create a new table from a file' option, which allows a user to specify if it should be an external table. I updated my reply to saranvisa's use cases, and the underlying HDFS files were deleted only if the HUE user who dropped the table was its creator. Fortunately, I do have access to HDFS superuser via the command line and was able to delete the table from my prior incident. Thanks for providing an alternative in the event that is not the case, especially since when deployed most users won't have command line access let alone HDFS superuser. Sounds like the trade-off is ease of use vs. level of security.
... View more
04-25-2017
08:01 AM
Thanks for the quick reply. I have another case. Use Case 3: 1. Login as User A and create a table tab1 and load data into it 2. Logout from User A and login as User B 3. As User B, load data into table tab1 Now if User A drops the table, will it also delete the file User B loaded? UPDATE: Just tested this and can confirm User B's loaded files will be deleted as well if User A drops the table.
... View more
04-24-2017
07:22 AM
When dropping a table from the Metastore Manager in HUE, the underlying HDFS files are not removed, which means users can still query the table (tested with Impala). The table was created using the Metastore Manager, and the data was added by running a Spark Action in Oozie (LOAD DATA INPATH... kv1.txt... INTO TABLE...) While logged in as a HUE superuser, I tried deleting the Hive folder corresponding to the table I wanted to remove, but I received a permission error: Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". AccessControlException: Permission denied by sticky bit: user=cloudera, path="/user/hive/warehouse/hivetest2":cloudera2:hive:drwxrwxrwt, parent="/user/hive/warehouse":hive:hive:drwxrwxrwt (error 500) What do I need to configure so that a HUE superuser can delete from Hive via the File Browser? What do I need to set so that dropping a table from the Metastore Manager deletes the HDFS files?
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Hue
04-21-2017
06:13 AM
That is a fair question of what is 'appropriate'. I was hoping there would be an option to select a default behavior to do so. For example, upon 'usr1' creating an index, the following permission would be generated: collection='the_new_idx"->user=usr1->action=* I imagine other global default behaviors could exist such that the auto-generated permission sets access for new collections at a role level instead of user level.
... View more
04-20-2017
01:04 PM
Can this be done at the collections/parent level in HUE/Sentry so that any time a user creates an index in Solr only the user who created it has access? In other words, what I'm trying to avoid having to do is setting permissions each time an index is created by a user. So if a user creates an index, Sentry automatically adds/updates the appropriate permissions. I don't see any explicit reference to this capability in the docs.
... View more
04-19-2017
10:41 AM
1 Kudo
In case anyone else has this issue, the documentation for CDH 5.10 is incorrect. https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25 It says to set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. I imagine this would be correct if you run Spark in stand-alone mode. However, if you run in yarn-client or yarn-cluster, the PYSPARK_PYTHON variable has to be set in YARN. The driver variable isn't relevant. It appears to be only relvent if you want to run it through a notebook. I didn't have to do any of the steps the docs say to do for yarn-cluster either. YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve) PYSPARK_PYTHON="/usr/bin/python" http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488
... View more
04-19-2017
06:37 AM
I'm interested in being able to prohibit users from interacting with, or even being aware of the existence of, specific indexes in Solr. For example, when a user in HUE looks at available indexes in HUE, they can only see the indexes they have permission to interact with. Is this possible with the Cloudera distribution? I'm running CDH 5.10. Thanks!!
... View more
Labels:
- Labels:
-
Apache Solr
-
Cloudera Hue
03-30-2017
11:14 AM
Unfortunately, Anaconda isn't an option for me. I also added "export" to my safety valve changes for the 2 python variables but numpy still cannot be found.
... View more
03-28-2017
03:22 PM
I have an intermittent issue. I've read the other threads regarding numpy not found on this site and other places on the web to solve my problem, but it keeps coming back after I re-deploy client configurations. I am running a Spark job through HUE->Oozie and using pyspark's MLlib which requires numpy. Initially, I read the Cloudera docs and blog indicating to install numpy to each node (Anaconda isn't an option for me). I installed numpy on each node using yum as root (I didn't create a virtual environment for this). This worked. However, I later re-deployed the client configurations through CM for reasons unrelated to this issue, and I received the numpy not found error again. At this point I went to the configuration page for Spark in CM to set the variables: PYSPARK_PYTHON=/usr/lib64/python2.7 PYSPARK_DRIVER_PYTHON=/usr/lib64/python2.7 in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. Source: https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25 Next, I re-deployed client configurations. It started working again. However, yet again after re-deploying later on for reasons unrelated to this issue, I got numpy not found again. So it seems that it just keeps coming back and only lasts for one deployment when it does work. I also looked into checking permissions for the python paths, and I don't see any issues there but I may be missing something. Could this be related to running it through HUE or Oozie? Are the environment variables I set nto the correct paths? Any help is appreciated. Thanks!
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
Cloudera Hue
03-17-2017
01:11 PM
CDH 5.10 Is there a way to customize which Action Nodes are displayed in the Oozie Workflow Editor in HUE? For example, say I don't want a user to be able to select the ssh action, or see it there at all. I couldn't find any reference to this in the docs. The closest thing that appears to be relevant is in the Oozie Dashboard -> Oozie -> gauges It has a value "configuration.action.types" that appear to list action nodes, except it has more listed than what is actually showing up on the editor; I am logged in as a super user when viewing this. If that is the correct property, how do i modify it (preferably through CDH or Hue)? Thanks!
... View more
Labels:
- Labels:
-
Apache Oozie
-
Cloudera Hue