About ed_day

ed_day · ‎11-24-2017

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

ed_day · ‎11-24-2017

Thanks Danilo but it is set to yarn-client

ed_day · ‎11-24-2017

Hi. I am running Spark2 from Zeppelin (0.7 in HDP 2.6) and I am doing an idf transformation which crashes after many hours. It is run on a cluster with a master and 3 datanodes: s1, s2 and s3. All nodes have a Spark2 client and each has 8 cores and 16GB RAM. I just noticed it is only running on one node s3 with 5 executors. In zeppelin-env.sh I have set zeppelin.executor.instances to 32 and zeppelin.executor.mem to 12g and it has the line: export MASTER=yarn-client I have set yarn.resourcemanager.scheduler.class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler. I also set spark.executor.instances to 32 in the Spark2 interprter. Anyone have any ideas what else I can try to get the other nodes doing their share?

ed_day · ‎07-28-2017

It was a setting in tez.lib.uris. Changed it to: /hdp/apps/${hdp.version}/tez/tez.tar.gz,hdfs://master.royble.co.uk:8020/jars/json-serde-1.3.7-jar-with-dependencies.jar (Note: no space after comma and hdfs path).

ed_day · ‎07-28-2017

Thanks Deepesh. It is: HIVE_AUX_JARS_PATH=/usr/hdp/2.6.0.3-8/hive/lib/json-serde-1.3.7-jar-with-dependencies.jar if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then if [ -f "${HIVE_AUX_JARS_PATH}" ]; then export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH} elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar fi elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar fi

ed_day · ‎07-26-2017

I am trying to run hive from the CLI: HADOOP_USER_NAME=hdfs hive -hiveconf hive.cli.print.header=true -hiveconf hive.support.sql11.reserved.keywords=false -hiveconf hive.aux.jars.path=/usr/hdp/2.6.0.3-8/hive/lib/json-serde-1.3.7-jar-with-dependencies.jar -hiveconf hive.root.logger=DEBUG,console but I get this error: java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://master.royble.co.uk:8020/user/hdfs/ /home/ed/Downloads/serde/json-serde-1.3.7-jar-with-dependencies.jar I have had so many problems with that jar, that I originally used to create a Hive table. Normally I would do an 'add jar' but I cannot start Hive to do that. I have tried adding the jar to hive-env, /usr/hdp/<version>/hive/auxlib (on the hive machine) and hive.aux.jars.path but nothing works. Any idea why Hive is looking for that odd path, or in fact why it is looking for it at all? FYI: master is not the machine with hive on it but it is where I run Ambari. The path /home/ed/Downloads/serde is one I have used in the past but can remember when. Using HDP-2.6.0.3. Any help is much appreciated as this is driving me mad!

ed_day · ‎07-26-2017

Did the job, thanks!

ed_day · ‎07-25-2017

In Rstudio I do: library(sparklyr) library(dplyr) Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client") # got from ambari spark2 configs config <- spark_config() sc <- spark_connect(master = "yarn-client", config = config, version = '2.1.0') which gives: Failed during initialize_connection: org.apache.hadoop.security.AccessControlException: Permission denied: user=ed, access=WRITE, inode="/user/ed/.sparkStaging/application_1500959138473_0003":admin:hadoop:drwxr-xr-x normally I fix this sort of problem with: HADOOP_USER_NAME=hdfs hadoop fs -put but I do not know how to do this in R. I thought maybe change ed's user and group to hdfs: ed@master:~$ hdfs dfs -ls /user Found 11 items drwx------ - accumulo hdfs 0 2017-05-14 15:38 /user/accumulo drwxr-xr-x - admin hadoop 0 2017-06-27 06:52 /user/admin drwxrwx--- - ambari-qa hdfs 0 2017-06-02 10:46 /user/ambari-qa drwxr-xr-x - admin hadoop 0 2017-06-02 11:00 /user/ed drwxr-xr-x - hbase hdfs 0 2017-05-14 15:35 /user/hbase drwxr-xr-x - hcat hdfs 0 2017-05-14 15:44 /user/hcat drwxr-xr-x - hdfs hdfs 0 2017-06-20 12:43 /user/hdfs drwxr-xr-x - hive hdfs 0 2017-05-14 15:44 /user/hive drwxrwxr-x - oozie hdfs 0 2017-05-14 15:46 /user/oozie drwxrwxr-x - spark hdfs 0 2017-05-14 15:40 /user/spark drwxr-xr-x - zeppelin hdfs 0 2017-07-24 09:29 /user/zeppelin but I am worried as it is currently admin/hadoop and admin is how I log into Ambari. So I do not want to mess up other stuff. Any help is much appreciated!

ed_day · ‎07-13-2017

Hi Muji, the above solved my problem. Thanks anyway.

ed_day · ‎07-05-2017

Here is how you do it: Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11. Edit the spark2 interpreter and add the name. Save it and allow it to restart. In Zeppelin: %spark.dep z.reset() z.load("databricks:spark-corenlp:0.2.0-s_2.11")

Online	Offline
Last Visited	‎05-08-2018 12:07 PM

Member Since	‎06-23-2016 11:52 AM
Last Visited	‎05-08-2018 12:07 PM
Posts	136
Kudos received	8

Cloudera Community

Re: Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: How do I add github dependency to spark?

Re: Can someone confirm these repos exist for HDP ...

Re: Hive INSERT OVERWRITE struct NoMatchingMethodE...

Re: Why is Spark2 running on only one node?

Re: Why is Spark2 running on only one node?

Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: Hive CLI filenotfoundexception

Hive CLI filenotfoundexception

Re: Permissions issue Spark R

Permissions issue Spark R

Re: Hive tez memory error

Re: How do I add github dependency to spark?