About ebeb

ebeb · ‎02-28-2019

Thanks! Manually create the .Trash directory in the user's home directory works to show "Move to Trash" in HUE.

ebeb · ‎12-19-2018

@bgooley Thanks a bunch! This is good info. I do see the below now which means /usr/lib/jvm is good for openJDK. Note: Cloudera strongly recommends installing Oracle JDK at /usr/java/<jdk-version> and OpenJDK at /usr/lib/jvm (or /usr/lib64/jvm on SLES 12), which allows Cloudera Manager to auto-detect and use the correct JDK version. Unfortunately in the CDH 5.16 install guide it doesnt clarify that for openJDK /usr/lib/jvm is good path but makes a blanket statement that The JDK must be installed at /usr/java/jdk-version. Hopefully they will update the doc in future. https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_ig_jdk_installation.html .

ebeb · ‎08-13-2018

This issue was resolved by following the instructions in this site: http://vijayjt.blogspot.com/2016/02/how-to-connect-to-kerberised-chd-hadoop.html We need to copy the Java JCE unlimited strength policy files and the krb5.conf file under jdk/jre/lib/security folder where SQL Developer is installed. After this the Hive connection via Kerberos was successful.

ebeb · ‎07-09-2018

Thanks a lot for the info! Will review these docs.

ebeb · ‎07-06-2018

During install if SElinux is enabled then apparently the hadoop directories created in /var/lib like hbase, hive, impala, sqoop, zookeeper etc. seem to have all the permissions set as 000 instead of 755 and also owned by root instead of the service accounts. This causes these roles unable to startup. Ended up having to chmod 755 and chown all these 15 or so directories after which the install completed sucessfully.

ebeb · ‎07-05-2018

One other thing. It looks like there were some issues with the Ubuntu OS and after switching over to Centos 7.5 the CDH 5.15 install ran without much issues. I have a question though, in the the install screens it has a Data Node configuration value: DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir Comma-delimited list of directories on the local file system where the DataNode stores HDFS block data. Typical values are /data/N/dfs/dn for N = 1, 2, 3.... These directories should be mounted using the noatime option, and the disks should be configured using JBOD. RAID is not recommended. In JBOD mode say the server has 20 hard disks so each of the 20 disk will have 20 file mount points. I think we need to set this value to comma-delimited /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn....../data/20/dfs/dn . Now what happens if some of the data nodes have different number of JBOD disks say 20 disks in some and 10 disks in others. Since this is a global variable dfs.data.dir how does it allocate the 20 data directories in those data nodes with only 10 JBOD hard disks? Since there is no hostname defined in this variable to indicate different nunber of disks in different hosts. Also in future if new datanodes are added with different number of disks how is this specified while adding new data nodes? Thanks!

GeKas · ‎03-09-2018

What is the meaning of having kafka-sentry, when you don't have kerberos enabled? For the moment, Kerberos is the only authentication engine supported by Kafka. When you don't have Kerberos enabled, all connection are treated the same. As you can see from the log, it thinks that the username is ANONYMOUS, that's why it tries to find the group that this user belongs to. Since, the local system is not aware of any user (either local or synced to an LDAP/AD) with the name "ANONYMOUS", there is no group retrieved, so it cannot be matched to any kafka-sentry rule. It is normal that it will fail. Of course, you can create a user account "ANONYMOUS", assign it to a group and define a kafka-sentry rule with this group. But what is the meaning to that? All connections will have the same permissions.

truonala · ‎11-22-2017

This issue should be fixed now. I have verified that the parcels show up on my cluster. Tina

ebeb · ‎11-02-2017

The solution was to put the python script in Hue->Query->Editor->Spark in the Libs field with the full path of the python script example: Libs: /user/userxyz/myscript.py and run the query. Clicking the job_xxxxx link will show if the script ran successfully or not.

ebeb · ‎10-29-2017

The good news is even though the shell script didnt work, I was able to run the same python script using Spark Hivecontext using the Spark action in Hue->Workflow instead of Shell action. The shell script is shexample7.sh: ------------------------------------------------- #!/usr/bin/env bash export PYTHONPATH=/usr/bin/python export PYSPARK_PYTHON=/usr/bin/python echo "starting..." /usr/bin/spark-submit --master yarn-cluster pyexample.py The python script is pyexample.py: ----------------------------------------------- #!/usr/bin/env python from pyspark import SparkContext from pyspark.sql import HiveContext sc = SparkContext("local", "pySpark Hive App") # Create a Hive Context hive_context = HiveContext(sc) print "Reading Hive table..." mytbl = hive_context.sql("SELECT * FROM xyzdb.testdata1") print "Registering DataFrame as a table..." mytbl.show() # Show first rows of dataframe mytbl.printSchema() The python job successfully displays the data but somehow the final status comes back as KILLED even though the python script ran and got back data from hive in stdout.

Online	Offline
Last Visited	‎12-20-2023 04:37 PM

Member Since	‎09-14-2017 07:07 AM
Last Visited	‎12-20-2023 04:37 PM
Posts	120
Kudos received	11

Cloudera Community

Re: HUE SAML error after upgrade to CDP 7.1.6

Re: CDP 7.2.4 upgrade - cloudera agents not starti...

Re: How to run Python script in Hue through oozie

Re: Cluster installation failure - src file /opt/c...

Re: spark.yarn.executor.memoryOverhead

Re: Move to Trash option in HUE not enabled

Re: openJDK install path

Re: Kerberos authentication with hive JDBC driver

Re: Install using CM of Datanodes with different n...

Re: Why I do need to turn off SElinux?

Re: Failed to connect to previous supervisor

Re: Kafka Sentry authorization: HadoopGroupMapping...

Re: Can't find cloudera 5.7 parcel

Re: How to run Python script in Hue through oozie

Re: ImportError: No module named pyspark from oozi...