Member since
06-18-2015
55
Posts
34
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1360 | 03-04-2016 02:39 AM | |
1907 | 12-29-2015 09:42 AM |
12-29-2015
09:41 AM
Finally resolved the issue the input data was not correct format so when I was using Timestamp/DateType its was returning empty result set.
... View more
12-28-2015
08:57 AM
Below code returns empty resullt set as I used TimeStamp as one of the StructField
15/12/28 03:34:27 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql.hive.orc._
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
15/12/28 03:34:57 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
15/12/28 03:34:57 INFO HiveContext: Initializing execution hive, version 0.13.1
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3413fbe
scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType,FloatType ,LongType ,TimestampType,NullType };
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, FloatType, LongType, TimestampType, NullType}
scala> val loandepoSchema = StructType(Seq(
| StructField("COLUMN1", StringType, true),
| StructField("COLUMN2", StringType , true),
| StructField("COLUMN3", TimestampType , true),
| StructField("COLUMN4", TimestampType , true),
| StructField("COLUMN5", StringType , true),
| StructField("COLUMN6", StringType, true),
| StructField("COLUMN7", IntegerType, true),
| StructField("COLUMN8", IntegerType, true),
| StructField("COLUMN9", StringType, true),
| StructField("COLUMN10", IntegerType, true),
| StructField("COLUMN11", IntegerType, true),
| StructField("COLUMN12", IntegerType, true),
| StructField("COLUMN13", StringType, true),
| StructField("COLUMN14", StringType, true),
| StructField("COLUMN15", StringType, true),
| StructField("COLUMN16", StringType, true),
| StructField("COLUMN17", StringType, true),
| StructField("COLUMN18", StringType, true),
| StructField("COLUMN19", StringType, true),
| StructField("COLUMN20", StringType, true),
| StructField("COLUMN21", StringType, true),
| StructField("COLUMN22", StringType, true)))
loandepoSchema: org.apache.spark.sql.types.StructType = StructType(StructField(COLUMN1,StringType,true), StructField(COLUMN2,StringType,true), StructField(COLUMN3,TimestampType,true), StructField(COLUMN4,TimestampType,true), StructField(COLUMN5,StringType,true), StructField(COLUMN6,StringType,true), StructField(COLUMN7,IntegerType,true), StructField(COLUMN8,IntegerType,true), StructField(COLUMN9,StringType,true), StructField(COLUMN10,IntegerType,true), StructField(COLUMN11,IntegerType,true), StructField(COLUMN12,IntegerType,true), StructField(COLUMN13,StringType,true), StructField(COLUMN14,StringType,true), StructField(COLUMN15,StringType,true), StructField(COLUMN16,StringType,true), StructField(COLUMN17,StringType,true), StructField(COLUMN18,StringType,true), StructField(COLUMN19,Strin...
scala> val lonadepodf = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").schema(loandepoSchema).load("/tmp/TestDivya/loandepo_10K.csv")
15/12/28 03:37:52 INFO HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.
lonadepodf: org.apache.spark.sql.DataFrame = [COLUMN1: string, COLUMN2: string, COLUMN3: timestamp, COLUMN4: timestamp, COLUMN5: string, COLUMN6: string, COLUMN7: int, COLUMN8: int, COLUMN9: string, COLUMN10: int, COLUMN11: int, COLUMN12: int, COLUMN13: string, COLUMN14: string, COLUMN15: string, COLUMN16: string, COLUMN17: string, COLUMN18: string, COLUMN19: string, COLUMN20: string, COLUMN21: string, COLUMN22: string]
scala> lonadepodf.select("COLUMN1").show(10)
15/12/28 03:38:01 INFO MemoryStore: ensureFreeSpace(216384) called with curMem=0, maxMem=278302556
15/12/28 03:38:01 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 211.3 KB, free 265.2 MB)
...............................................................................
15/12/28 03:38:07 INFO DAGScheduler: ResultStage 2 (show at <console>:33) finished in 0.653 s
15/12/28 03:38:07 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
15/12/28 03:38:07 INFO DAGScheduler: Job 2 finished: show at <console>:33, took 0.669388 s
+-------+
|COLUMN1|
+-------+
+-------+
Once Timestamp StructField is removed . Result set is returned
scala> val loandepoSchema = StructType(Seq(
| StructField("COLUMN1", StringType, true),
| StructField("COLUMN2", StringType , true),
| StructField("COLUMN3", StringType , true),
| StructField("COLUMN4", StringType , true),
| StructField("COLUMN5", StringType , true),
| StructField("COLUMN6", StringType, true),
| StructField("COLUMN7", IntegerType, true),
| StructField("COLUMN8", IntegerType, true),
| StructField("COLUMN9", StringType, true),
| StructField("COLUMN10", IntegerType, true),
| StructField("COLUMN11", IntegerType, true),
| StructField("COLUMN12", IntegerType, true),
| StructField("COLUMN13", StringType, true),
| StructField("COLUMN14", StringType, true),
| StructField("COLUMN15", StringType, true),
| StructField("COLUMN16", StringType, true),
| StructField("COLUMN17", StringType, true),
| StructField("COLUMN18", StringType, true),
| StructField("COLUMN19", StringType, true),
| StructField("COLUMN20", StringType, true),
| StructField("COLUMN21", StringType, true),
| StructField("COLUMN22", StringType, true)))
loandepoSchema: org.apache.spark.sql.types.StructType = StructType(StructField(COLUMN1,StringType,true), StructField(COLUMN2,StringType,true), StructField(COLUMN3,StringType,true), StructField(COLUMN4,StringType,true), StructField(COLUMN5,StringType,true), StructField(COLUMN6,StringType,true), StructField(COLUMN7,IntegerType,true), StructField(COLUMN8,IntegerType,true), StructField(COLUMN9,StringType,true), StructField(COLUMN10,IntegerType,true), StructField(COLUMN11,IntegerType,true), StructField(COLUMN12,IntegerType,true), StructField(COLUMN13,StringType,true), StructField(COLUMN14,StringType,true), StructField(COLUMN15,StringType,true), StructField(COLUMN16,StringType,true), StructField(COLUMN17,StringType,true), StructField(COLUMN18,StringType,true), StructField(COLUMN19,StringType,...
scala> val lonadepodf = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").schema(loandepoSchema).load("/tmp/TestDivya/loandepo_10K.csv")
lonadepodf: org.apache.spark.sql.DataFrame = [COLUMN1: string, COLUMN2: string, COLUMN3: string, COLUMN4: string, COLUMN5: string, COLUMN6: string, COLUMN7: int, COLUMN8: int, COLUMN9: string, COLUMN10: int, COLUMN11: int, COLUMN12: int, COLUMN13: string, COLUMN14: string, COLUMN15: string, COLUMN16: string, COLUMN17: string, COLUMN18: string, COLUMN19: string, COLUMN20: string, COLUMN21: string, COLUMN22: string]
scala> lonadepodf.select("COLUMN1").show(10)
15/12/28 03:39:48 INFO BlockManagerInfo: Removed broadcast_8_piece0 on 172.31.20.85:40013 in memory (size: 4.2 KB, free: 265.3 MB)
15/12/28 03:39:49 INFO YarnScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
15/12/28 03:39:49 INFO DAGScheduler: Job 6 finished: show at <console>:33, took 0.223277 s
+-------+
|COLUMN1|
+-------+
| CTR0|
| CTR1|
| CTR2|
| CTR3|
| CTR4|
| CTR5|
| CTR6|
| CTR7|
| CTR8|
| CTR9|
+-------+
... View more
Labels:
- Labels:
-
Apache Spark
12-24-2015
06:13 AM
Hi, I am getting error while starting the Zeppelin service through Ambari Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/ZEPPELIN/package/scripts/master.py", line 295, in <module>
Master().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 216, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/ZEPPELIN/package/scripts/master.py", line 230, in start
Execute (params.zeppelin_dir+'/bin/zeppelin-daemon.sh start >> ' + params.zeppelin_log_file, user=params.zeppelin_user)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 260, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 290, in _call
err_msg = Logger.filter_text(("Execution of '%s' returned %d. %s") % (command_alias, code, all_output))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128) Any pointer/guidance would be really appreaciated. Thanks
... View more
Labels:
- Labels:
-
Apache Zeppelin
12-22-2015
02:45 AM
@Neeraj Sabharwal We are currently using Amazon EC2 API tools to start and stop the instances.and able to start the ambari server too. But the issue now is in core-site.xml, In its still showing the old public URL for hive services. How can we make this changes happen automatically.
... View more
12-22-2015
02:20 AM
1 Kudo
Hi, I have HDP 2.3.2 cluster set up on Amazon Ec2 on RHEL 7.x . We are stopping the Amazon cluster when not in used basically after work hours. Can somebody share their experience, how can we start Ambari services using Amazon EC2 API or any other means. Any other best practices which needs to be followed. Would really appreciate your help. Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
12-21-2015
02:22 AM
1 Kudo
@Neeraj Sabharwal In this above resolution which you have mentioned .. every time I have to change the permission .. Is there any settings which I can change , and give permission to the hive user for newly created hdfs files. for instance: selectedData.write.format("orc").option("header","true").save("/tmp/newcars_orc_cust17") Thanks
... View more
12-18-2015
06:34 AM
@Neeraj Sabharwal I encountered the issue I had enabled Bridge network connection in my VMWare because of which it was not installing the spark-csv packages and I was getting (java.net.ConnectException: Connection refused) .
... View more
12-18-2015
06:26 AM
@vshukla I am logging in as hdfs user on HDP 2.3.2 sandbox and using the same account to see tables in hive.Yes , I am using hive CLI and even browsed HDFS files through Ambari .Couldnt see any tables created.
... View more
12-18-2015
06:09 AM
1 Kudo
Hi , I am getting permission denied error when creating external table in hive context. FYI : logged in as hdfs user import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/cars.csv")
val selectedData = df.select("year", "model")
selectedData.write.format("orc").option("header", "true").save("/tmp/newcars_orc_cust17") hiveContext.sql
("create external table newcars_orc_ext_cust17(year string,model string) stored as orc location '/tmp/newcars_orc_cust17'")
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/newcars_orc_cust17":hdfs:hdfs:drwxr-xr-x
... View more
Labels: