About DivyaGehlot13

DivyaGehlot13 · ‎12-17-2015

Hi, I am confused about hue .. 1.This link says have to use hue which shipped with HDP 2.3.2 . 2.When I google it I get this link Which option is correct ? Would really appreciate if somebody can helps me with the steps ? My cluster is on EC2 having RHEL .

DivyaGehlot13 · ‎12-17-2015

Hi @ Neeraj Sabharwal @Jeremy Dyer Processing and inserting data in hive without schema //Processing and inserting data in hive without schema import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.orc._ val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/cars.csv") val selectedData = df.select("year", "model") selectedData.write.format("orc").option("header", "true").save("/tmp/newcars_orc_cust17") //permission issues as user hive // org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/newcars_orc_cust17":hdfs:hdfs:drwxr-xr-x //Updated /tmp/newcars_orc_cust17 directory permissions hiveContext.sql("create external table newcars_orc_ext_cust17(year string,model string) stored as orc location '/tmp/newcars_orc_cust17'") hiveContext.sql("show tables").collect().foreach(println) [cars_orc_ext,false] [cars_orc_ext1,false] [cars_orc_exte,false] [newcars_orc_ext_cust17,false] [sample_07,false] [sample_08,false] hiveContext.sql("select * from newcars_orc_ext_cust17").collect().foreach(println) ook 1.459321 s [2012,S] [1997,E350] [2015,Volt] Hive console hive> show tables ; OK cars_orc_ext cars_orc_ext1 cars_orc_exte newcars_orc_ext_cust17 sample_07 sample_08 Time taken: 12.185 seconds, Fetched: 6 row(s) hive> select * from newcars_orc_ext_cust17 ; OK 2012 S 1997 E350 2015 Volt Time taken: 48.922 seconds, Fetched: 3 row(s) Now When I try the same code by defining the custom schema and executing it Getting below errors : Processing and inserting data in hive with custom schema import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.orc._ import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}; val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)) <br>scala> val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)) <console>:24: error: overloaded method value apply with alternatives: (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and> (fields: java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType <and> (fields: Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType cannot be applied to (org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StructField) val customSchema = StructType( StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)) Any help/pointers appreciated Thanks

DivyaGehlot13 · ‎12-17-2015

@vshukla I am also facing the same issue .. I saved the data in orc format from DF and created external hive table ..when I do show tables in hive context in spark it shows me the table but I couldnt see any table in my hive warehouse so when I query the hive external table. when I just create the hive table(no df no data processing ) using hivecontext table get created and able to query also .Unable to understand this strange behaviour . Am I misisng something ? for ex : hiveContext.sql("CREATE TABLE IF NOT EXISTS TestTable (name STRING, age STRING)") shows me the table in hive also.

DivyaGehlot13 · ‎12-14-2015

Hi, I am new bee to spark and using spark 1.4.1 How can I save the output to hive as external table . For instance ,I have a csv file which I am parsing through spark -csv packages which results me a DataFrame. Now how do I save this dataframe as hive external table using hivecontext. Would really appreciate your pointers/guidance. Thanks, Divya

DivyaGehlot13 · ‎12-14-2015

@Neeraj Sabharwal Thanks alot for the prompt response. I am using HDP2.3.2 Vmware version(Link) . Is there any workaround to make it work?

DivyaGehlot13 · ‎12-14-2015

Is spark-csv packages is not supported by HDP2.3.2? I am getting below error when I try to run spark-shell that spark-csv package is not supported. [hdfs@sandbox root]$ spark-shell --packages com.databricks:spark-csv_2.10:1.1.0 --master yarn-client --driver-memory 512m --executor-memory 512m Ivy Default Cache set to: /home/hdfs/.ivy2/cache The jars for the packages stored in: /home/hdfs/.ivy2/jars :: loading settings :: url = jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-csv_2.10 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] :: resolution report :: resolve 332ms :: artifacts dl 0ms :: modules in use: --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 0 | 0 | --------------------------------------------------------------------- :: problems summary :: :::: WARNINGS module not found: com.databricks#spark-csv_2.10;1.1.0 ==== local-m2-cache: tried file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar: file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar ==== local-ivy-cache: tried /home/hdfs/.ivy2/local/com.databricks/spark-csv_2.10/1.1.0/ivys/ivy.xml ==== central: tried https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar: https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar ==== spark-packages: tried http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar: http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar :::::::::::::::::::::::::::::::::::::::::::::: :: UNRESOLVED DEPENDENCIES :: :::::::::::::::::::::::::::::::::::::::::::::: :: com.databricks#spark-csv_2.10;1.1.0: not found :::::::::::::::::::::::::::::::::::::::::::::: :::: ERRORS Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom (java.net.ConnectException: Connection refused) Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar (java.net.ConnectException: Connection refused) :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.1.0: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:995) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:263) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:145) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/12/14 01:49:39 INFO Utils: Shutdown hook called [hdfs@sandbox root]$ Would really appreciate your help.

DivyaGehlot13 · ‎12-11-2015

Hi, I am using HDP2.3.2 with Spark 1.4.1 and trying to insert data in hive table using hive context. Below is the sample code spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m //Sample code import org.apache.spark.sql.SQLContext import sqlContext.implicits._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val people = sc.textFile("/user/spark/people.txt") val schemaString = "name age" import org.apache.spark.sql.Row; import org.apache.spark.sql.types.{StructType,StructField,StringType}; val schema = StructType( schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim)) //Create hive context val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) //Apply the schema to the val df = hiveContext.createDataFrame(rowRDD, schema); val options = Map("path" -> "hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/personhivetable") df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").options(options).saveAsTable("personhivetable") Getting below error : org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:191) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29) at $line30.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC$anonfun$2.apply(<console>:29) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$writeRows$1(commands.scala:182) ... 8 more Is it configuration issue? When I googled it I found out that Environment variable named HIVE_CONF_DIR should be there in spark-env.sh Then I checked spark-env.sh in HDP2.3.2,I couldnt find the Environment variable named HIVE_CONF_DIR . Do I need to add above mentioned variables to insert spark output data to hive tables. Would really appreciate pointers. Thanks, Divya

DivyaGehlot13 · ‎12-10-2015

Hi, @AliBajwa I tried the same steps mentioned in BUG-46851 with VMware Sandbox HDP 2.3.2 . Voila , I am able to view Zeppelin Page. Thanks alot for all your help. Still Trying to figure out whats wrong with virtual box HDP2.3.2 sandbox .

DivyaGehlot13 · ‎12-10-2015

Hi, @ali Bajwa I am using HDP2.3.2 sandbox for virtual box. I tried the option which you have mentioned . I edited my windows /etc/ hosts file with the ip :127.0.0.1 sandbox.hortonworks.com I tried accessing the Zeppelin page ,Now I am getting "Unable to resolve the server's DNS address." Screenshot for your reference. @ Neeraj Sabharwal :I am less familier with the port forwarding .Can you please elaborate more , what exact steps I need to follow for Zeppelin to work. Thanks in advance Divya

DivyaGehlot13 · ‎12-10-2015

Hi, I have installed HDP 2.3.2 sandbox for virtualbox and when I try to access Zepplin through Ambari . I am getting below screen . Am I missing any configuration ? Would really appreciate your help. Thanks, Divya

Online	Offline
Last Visited	‎06-19-2015 01:01 AM

Member Since	‎06-18-2015 09:34 PM
Last Visited	‎06-19-2015 01:01 AM
Posts	55
Kudos received	34

Cloudera Community

Re: [Error]:Accessing hbase table with Spark's Hi...

Re: returns empty result set when using TimestampT...

Install hue in HDP 2.3.2

Re: save spark-csv output to hive in HDP 2.3.2

Re: How do I create an ORC Hive table from Spark?

save spark-csv output to hive in HDP 2.3.2

Re: Spark-csv support in HDP 2.3.2

Spark-csv support in HDP 2.3.2

org.apache.spark.SparkException: Task failed while...

Re: Zepplin error in Ambari Sandbox + HDP 2.3.2

Re: Zepplin error in Ambari Sandbox + HDP 2.3.2

Zepplin error in Ambari Sandbox + HDP 2.3.2