About mayank1984

mayank1984 · ‎07-29-2017

The spark request is now getting submitted but now i am getting following error: hive> select count(*) from kaggle.test_house; Query ID = ec2-user_20170729070303_887365d6-ce92-4ec3-bc8a-2adf3cfec117 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Spark Job = 614015ef-31f9-4e14-9b71-c161f64916db Job hasn't been submitted after 61s. Aborting it. Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. Please check YARN or Spark driver's logs for further information. Status: SENT FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

mayank1984 · ‎07-29-2017

Thank you for the reply. I did not have the spark folder in the location. I had SPARK2. After I run the command. I get the below error. [ec2-user@ip-172-31-37-124 jars]$ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-examples_2.11-2.2.0.cloudera1.jar WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark) overrides detected (/usr/lib/spark). WARNING: Running spark-class from user-defined location. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$ at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 11 more

mayank1984 · ‎07-28-2017

I have installed Spark and configure Hive to use it as execution engine. Select * from table name works fine. But select count(*) from table name fails with following error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask At times also got an error stating "failed to create spark client" I have also tried to modify the memort parameters but to no avail. Can you please tell me what should be the ideal memory setting? Below is the directory structure from hdfs drwxr-xr-x - admin admin 0 2017-07-28 16:36 /user/admin drwx------ - ec2-user supergroup 0 2017-07-28 17:50 /user/ec2-user drwxr-xr-x - hdfs hdfs 0 2017-07-28 11:37 /user/hdfs drwxrwxrwx - mapred hadoop 0 2017-07-16 06:03 /user/history drwxrwxr-t - hive hive 0 2017-07-16 06:04 /user/hive drwxrwxr-x - hue hue 0 2017-07-28 10:16 /user/hue drwxrwxr-x - impala impala 0 2017-07-16 07:13 /user/impala drwxrwxr-x - oozie oozie 0 2017-07-16 06:05 /user/oozie drwxr-x--x - spark spark 0 2017-07-28 17:17 /user/spark drwxrwxr-x - sqoop2 sqoop 0 2017-07-16 06:37 /user/sqoop2 the /user directory has owner as ec2-user and group as supergroup. I tried running the query from CLI: WARNING: Hive CLI is deprecated and migration to Beeline is recommended. hive> select count(*) from kaggle.test_house; Query ID = ec2-user_20170728174949_aa9d7be9-038c-44a0-a42b-1b210a37f4ec Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

mayank1984 · ‎07-25-2017

Did it !! Yohuuuu !!! I can see the page and am logged in!!! Thank you so so much !! You saved my day !!

mayank1984 · ‎07-25-2017

Thanks for the quick reply. This is my personal setup and I am a starter in this area. Will you be able to help me with baby steps ? APologies for being too demanding 🙂 what i have now done is created a DNS is route53 in public zone. And i have updated the name servers into my domain at godaddy. Now what should i do next in AWS? Create a record set in AWS? And what should i do on my EC2 instance where the Workbench has been installed?

mayank1984 · ‎07-25-2017

Hi Thank you so much for your information. So i am currently testing it. But I have hosted my cloudera single node environment on AWS EC2 instance. I also own a domain on godaddy www.datacloudera.com. I have changed the nameserver to the one in AWS DNS Zone. Should the hosted zone be public ? or private in a vpc? ns-0.awsdns-00.com ns-1024.awsdns-00.org ns-512.awsdns-00.net ns-1536.awsdns-00.co.uk But i am not sure how to proceed next. What are the steps which i should take post this one ?

mayank1984 · ‎07-25-2017

hi do you have a step by step detail for the Wildcarding of DNS? I am stuck at that point. I have an AWS EC2 instance on which I have installed the workbench. However i am unable to open the URL.

mayank1984 · ‎07-25-2017

quick question. Are you able to open the url from open internet?

mayank1984 · ‎07-25-2017

I give up. Unable to resolve this.

mayank1984 · ‎07-25-2017

ok. I had tried Route 53. But did not work. I guess my concepts on networking need bit of refresh.

Online	Offline
Last Visited	‎08-08-2017 01:07 PM

Member Since	‎07-24-2017 02:15 PM
Last Visited	‎08-08-2017 01:07 PM
Posts	14
Kudos received	3

Cloudera Community

Re: Hive on Spark Queries are not working

Re: Hive on Spark Queries are not working

Hive on Spark Queries are not working

Re: How to setup wildcard DNS subdomain

Re: How to setup wildcard DNS subdomain

Re: How to setup wildcard DNS subdomain

Re: Data Science Workbench installation instructio...

Re: How to setup wildcard DNS subdomain

Re: How to setup wildcard DNS subdomain

Re: How to setup wildcard DNS subdomain