About jpayne1

jpayne1 · ‎04-25-2017

It is an internal table. The creation process was using the HUE GUI to 'Create a new table manually' in the Metastore Manager for the Hive default database. I didn't choose the 'Create a new table from a file' option, which allows a user to specify if it should be an external table. I updated my reply to saranvisa's use cases, and the underlying HDFS files were deleted only if the HUE user who dropped the table was its creator. Fortunately, I do have access to HDFS superuser via the command line and was able to delete the table from my prior incident. Thanks for providing an alternative in the event that is not the case, especially since when deployed most users won't have command line access let alone HDFS superuser. Sounds like the trade-off is ease of use vs. level of security.

jpayne1 · ‎04-25-2017

Thanks for the quick reply. I have another case. Use Case 3: 1. Login as User A and create a table tab1 and load data into it 2. Logout from User A and login as User B 3. As User B, load data into table tab1 Now if User A drops the table, will it also delete the file User B loaded? UPDATE: Just tested this and can confirm User B's loaded files will be deleted as well if User A drops the table.

jpayne1 · ‎04-24-2017

When dropping a table from the Metastore Manager in HUE, the underlying HDFS files are not removed, which means users can still query the table (tested with Impala). The table was created using the Metastore Manager, and the data was added by running a Spark Action in Oozie (LOAD DATA INPATH... kv1.txt... INTO TABLE...) While logged in as a HUE superuser, I tried deleting the Hive folder corresponding to the table I wanted to remove, but I received a permission error: Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". AccessControlException: Permission denied by sticky bit: user=cloudera, path="/user/hive/warehouse/hivetest2":cloudera2:hive:drwxrwxrwt, parent="/user/hive/warehouse":hive:hive:drwxrwxrwt (error 500) What do I need to configure so that a HUE superuser can delete from Hive via the File Browser? What do I need to set so that dropping a table from the Metastore Manager deletes the HDFS files?

jpayne1 · ‎04-21-2017

That is a fair question of what is 'appropriate'. I was hoping there would be an option to select a default behavior to do so. For example, upon 'usr1' creating an index, the following permission would be generated: collection='the_new_idx"->user=usr1->action=* I imagine other global default behaviors could exist such that the auto-generated permission sets access for new collections at a role level instead of user level.

jpayne1 · ‎04-20-2017

Can this be done at the collections/parent level in HUE/Sentry so that any time a user creates an index in Solr only the user who created it has access? In other words, what I'm trying to avoid having to do is setting permissions each time an index is created by a user. So if a user creates an index, Sentry automatically adds/updates the appropriate permissions. I don't see any explicit reference to this capability in the docs.

jpayne1 · ‎04-19-2017

In case anyone else has this issue, the documentation for CDH 5.10 is incorrect. https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25 It says to set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. I imagine this would be correct if you run Spark in stand-alone mode. However, if you run in yarn-client or yarn-cluster, the PYSPARK_PYTHON variable has to be set in YARN. The driver variable isn't relevant. It appears to be only relvent if you want to run it through a notebook. I didn't have to do any of the steps the docs say to do for yarn-cluster either. YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve) PYSPARK_PYTHON="/usr/bin/python" http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488

jpayne1 · ‎04-19-2017

I'm interested in being able to prohibit users from interacting with, or even being aware of the existence of, specific indexes in Solr. For example, when a user in HUE looks at available indexes in HUE, they can only see the indexes they have permission to interact with. Is this possible with the Cloudera distribution? I'm running CDH 5.10. Thanks!!

jpayne1 · ‎03-30-2017

Unfortunately, Anaconda isn't an option for me. I also added "export" to my safety valve changes for the 2 python variables but numpy still cannot be found.

jpayne1 · ‎03-28-2017

I have an intermittent issue. I've read the other threads regarding numpy not found on this site and other places on the web to solve my problem, but it keeps coming back after I re-deploy client configurations. I am running a Spark job through HUE->Oozie and using pyspark's MLlib which requires numpy. Initially, I read the Cloudera docs and blog indicating to install numpy to each node (Anaconda isn't an option for me). I installed numpy on each node using yum as root (I didn't create a virtual environment for this). This worked. However, I later re-deployed the client configurations through CM for reasons unrelated to this issue, and I received the numpy not found error again. At this point I went to the configuration page for Spark in CM to set the variables: PYSPARK_PYTHON=/usr/lib64/python2.7 PYSPARK_DRIVER_PYTHON=/usr/lib64/python2.7 in Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. Source: https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25 Next, I re-deployed client configurations. It started working again. However, yet again after re-deploying later on for reasons unrelated to this issue, I got numpy not found again. So it seems that it just keeps coming back and only lasts for one deployment when it does work. I also looked into checking permissions for the python paths, and I don't see any issues there but I may be missing something. Could this be related to running it through HUE or Oozie? Are the environment variables I set nto the correct paths? Any help is appreciated. Thanks!

jpayne1 · ‎03-17-2017

CDH 5.10 Is there a way to customize which Action Nodes are displayed in the Oozie Workflow Editor in HUE? For example, say I don't want a user to be able to select the ssh action, or see it there at all. I couldn't find any reference to this in the docs. The closest thing that appears to be relevant is in the Oozie Dashboard -> Oozie -> gauges It has a value "configuration.action.types" that appear to list action nodes, except it has more listed than what is actually showing up on the editor; I am logged in as a super user when viewing this. If that is the correct property, how do i modify it (preferably through CDH or Hue)? Thanks!

Online	Offline
Last Visited	‎05-04-2017 09:23 AM

Member Since	‎03-17-2017 01:11 PM
Last Visited	‎05-04-2017 09:23 AM
Posts	11
Kudos received	1

Cloudera Community

Re: ImportError: No module named numpy (after re-d...

Re: HUE Metastore Manager - Drop Table not deletin...

Re: HUE Metastore Manager - Drop Table not deletin...

HUE Metastore Manager - Drop Table not deleting fi...

Re: How to limit user access to Solr indexes?

Re: How to limit user access to Solr indexes?

Re: ImportError: No module named numpy (after re-d...

How to limit user access to Solr indexes?

Re: ImportError: No module named numpy (after re-d...

ImportError: No module named numpy (after re-deplo...

Limit Available Action Nodes in Oozie Workflow Edi...