Member since
09-14-2017
120
Posts
11
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3138 | 06-17-2021 06:55 AM | |
1923 | 01-13-2021 01:56 PM | |
17196 | 11-02-2017 06:35 AM | |
19002 | 10-04-2017 02:43 PM | |
34411 | 09-14-2017 06:40 PM |
02-28-2019
08:01 AM
Thanks! Manually create the .Trash directory in the user's home directory works to show "Move to Trash" in HUE.
... View more
12-19-2018
02:23 PM
@bgooley Thanks a bunch! This is good info. I do see the below now which means /usr/lib/jvm is good for openJDK. Note: Cloudera strongly recommends installing Oracle JDK at /usr/java/<jdk-version> and OpenJDK at /usr/lib/jvm (or /usr/lib64/jvm on SLES 12), which allows Cloudera Manager to auto-detect and use the correct JDK version. Unfortunately in the CDH 5.16 install guide it doesnt clarify that for openJDK /usr/lib/jvm is good path but makes a blanket statement that The JDK must be installed at /usr/java/jdk-version. Hopefully they will update the doc in future. https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_ig_jdk_installation.html .
... View more
08-13-2018
02:19 PM
This issue was resolved by following the instructions in this site: http://vijayjt.blogspot.com/2016/02/how-to-connect-to-kerberised-chd-hadoop.html We need to copy the Java JCE unlimited strength policy files and the krb5.conf file under jdk/jre/lib/security folder where SQL Developer is installed. After this the Hive connection via Kerberos was successful.
... View more
07-09-2018
12:39 PM
Thanks a lot for the info! Will review these docs.
... View more
07-06-2018
02:25 PM
1 Kudo
During install if SElinux is enabled then apparently the hadoop directories created in /var/lib like hbase, hive, impala, sqoop, zookeeper etc. seem to have all the permissions set as 000 instead of 755 and also owned by root instead of the service accounts. This causes these roles unable to startup. Ended up having to chmod 755 and chown all these 15 or so directories after which the install completed sucessfully.
... View more
07-05-2018
07:37 AM
One other thing. It looks like there were some issues with the Ubuntu OS and after switching over to Centos 7.5 the CDH 5.15 install ran without much issues. I have a question though, in the the install screens it has a Data Node configuration value: DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir Comma-delimited list of directories on the local file system where the DataNode stores HDFS block data. Typical values are /data/N/dfs/dn for N = 1, 2, 3.... These directories should be mounted using the noatime option, and the disks should be configured using JBOD. RAID is not recommended. In JBOD mode say the server has 20 hard disks so each of the 20 disk will have 20 file mount points. I think we need to set this value to comma-delimited /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn....../data/20/dfs/dn . Now what happens if some of the data nodes have different number of JBOD disks say 20 disks in some and 10 disks in others. Since this is a global variable dfs.data.dir how does it allocate the 20 data directories in those data nodes with only 10 JBOD hard disks? Since there is no hostname defined in this variable to indicate different nunber of disks in different hosts. Also in future if new datanodes are added with different number of disks how is this specified while adding new data nodes? Thanks!
... View more
03-09-2018
06:26 AM
2 Kudos
What is the meaning of having kafka-sentry, when you don't have kerberos enabled? For the moment, Kerberos is the only authentication engine supported by Kafka. When you don't have Kerberos enabled, all connection are treated the same. As you can see from the log, it thinks that the username is ANONYMOUS, that's why it tries to find the group that this user belongs to. Since, the local system is not aware of any user (either local or synced to an LDAP/AD) with the name "ANONYMOUS", there is no group retrieved, so it cannot be matched to any kafka-sentry rule. It is normal that it will fail. Of course, you can create a user account "ANONYMOUS", assign it to a group and define a kafka-sentry rule with this group. But what is the meaning to that? All connections will have the same permissions.
... View more
11-22-2017
06:46 AM
This issue should be fixed now. I have verified that the parcels show up on my cluster. Tina
... View more
11-02-2017
06:35 AM
1 Kudo
The solution was to put the python script in Hue->Query->Editor->Spark in the Libs field with the full path of the python script example: Libs: /user/userxyz/myscript.py and run the query. Clicking the job_xxxxx link will show if the script ran successfully or not.
... View more
10-29-2017
04:07 PM
The good news is even though the shell script didnt work, I was able to run the same python script using Spark Hivecontext using the Spark action in Hue->Workflow instead of Shell action. The shell script is shexample7.sh: ------------------------------------------------- #!/usr/bin/env bash export PYTHONPATH=/usr/bin/python export PYSPARK_PYTHON=/usr/bin/python echo "starting..." /usr/bin/spark-submit --master yarn-cluster pyexample.py The python script is pyexample.py: ----------------------------------------------- #!/usr/bin/env python from pyspark import SparkContext from pyspark.sql import HiveContext sc = SparkContext("local", "pySpark Hive App") # Create a Hive Context hive_context = HiveContext(sc) print "Reading Hive table..." mytbl = hive_context.sql("SELECT * FROM xyzdb.testdata1") print "Registering DataFrame as a table..." mytbl.show() # Show first rows of dataframe mytbl.printSchema() The python job successfully displays the data but somehow the final status comes back as KILLED even though the python script ran and got back data from hive in stdout.
... View more
- « Previous
- Next »