About DivyaGehlot13

DivyaGehlot13 · ‎12-29-2015

Finally resolved the issue the input data was not correct format so when I was using Timestamp/DateType its was returning empty result set.

DivyaGehlot13 · ‎12-28-2015

Below code returns empty resullt set as I used TimeStamp as one of the StructField 15/12/28 03:34:27 INFO SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext scala> import org.apache.spark.sql.hive.orc._ import org.apache.spark.sql.hive.orc._ scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 15/12/28 03:34:57 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 15/12/28 03:34:57 INFO HiveContext: Initializing execution hive, version 0.13.1 hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3413fbe scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType,FloatType ,LongType ,TimestampType,NullType }; import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, FloatType, LongType, TimestampType, NullType} scala> val loandepoSchema = StructType(Seq( | StructField("COLUMN1", StringType, true), | StructField("COLUMN2", StringType , true), | StructField("COLUMN3", TimestampType , true), | StructField("COLUMN4", TimestampType , true), | StructField("COLUMN5", StringType , true), | StructField("COLUMN6", StringType, true), | StructField("COLUMN7", IntegerType, true), | StructField("COLUMN8", IntegerType, true), | StructField("COLUMN9", StringType, true), | StructField("COLUMN10", IntegerType, true), | StructField("COLUMN11", IntegerType, true), | StructField("COLUMN12", IntegerType, true), | StructField("COLUMN13", StringType, true), | StructField("COLUMN14", StringType, true), | StructField("COLUMN15", StringType, true), | StructField("COLUMN16", StringType, true), | StructField("COLUMN17", StringType, true), | StructField("COLUMN18", StringType, true), | StructField("COLUMN19", StringType, true), | StructField("COLUMN20", StringType, true), | StructField("COLUMN21", StringType, true), | StructField("COLUMN22", StringType, true))) loandepoSchema: org.apache.spark.sql.types.StructType = StructType(StructField(COLUMN1,StringType,true), StructField(COLUMN2,StringType,true), StructField(COLUMN3,TimestampType,true), StructField(COLUMN4,TimestampType,true), StructField(COLUMN5,StringType,true), StructField(COLUMN6,StringType,true), StructField(COLUMN7,IntegerType,true), StructField(COLUMN8,IntegerType,true), StructField(COLUMN9,StringType,true), StructField(COLUMN10,IntegerType,true), StructField(COLUMN11,IntegerType,true), StructField(COLUMN12,IntegerType,true), StructField(COLUMN13,StringType,true), StructField(COLUMN14,StringType,true), StructField(COLUMN15,StringType,true), StructField(COLUMN16,StringType,true), StructField(COLUMN17,StringType,true), StructField(COLUMN18,StringType,true), StructField(COLUMN19,Strin... scala> val lonadepodf = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").schema(loandepoSchema).load("/tmp/TestDivya/loandepo_10K.csv") 15/12/28 03:37:52 INFO HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes. lonadepodf: org.apache.spark.sql.DataFrame = [COLUMN1: string, COLUMN2: string, COLUMN3: timestamp, COLUMN4: timestamp, COLUMN5: string, COLUMN6: string, COLUMN7: int, COLUMN8: int, COLUMN9: string, COLUMN10: int, COLUMN11: int, COLUMN12: int, COLUMN13: string, COLUMN14: string, COLUMN15: string, COLUMN16: string, COLUMN17: string, COLUMN18: string, COLUMN19: string, COLUMN20: string, COLUMN21: string, COLUMN22: string] scala> lonadepodf.select("COLUMN1").show(10) 15/12/28 03:38:01 INFO MemoryStore: ensureFreeSpace(216384) called with curMem=0, maxMem=278302556 15/12/28 03:38:01 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 211.3 KB, free 265.2 MB) ............................................................................... 15/12/28 03:38:07 INFO DAGScheduler: ResultStage 2 (show at <console>:33) finished in 0.653 s 15/12/28 03:38:07 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 15/12/28 03:38:07 INFO DAGScheduler: Job 2 finished: show at <console>:33, took 0.669388 s +-------+ |COLUMN1| +-------+ +-------+ Once Timestamp StructField is removed . Result set is returned scala> val loandepoSchema = StructType(Seq( | StructField("COLUMN1", StringType, true), | StructField("COLUMN2", StringType , true), | StructField("COLUMN3", StringType , true), | StructField("COLUMN4", StringType , true), | StructField("COLUMN5", StringType , true), | StructField("COLUMN6", StringType, true), | StructField("COLUMN7", IntegerType, true), | StructField("COLUMN8", IntegerType, true), | StructField("COLUMN9", StringType, true), | StructField("COLUMN10", IntegerType, true), | StructField("COLUMN11", IntegerType, true), | StructField("COLUMN12", IntegerType, true), | StructField("COLUMN13", StringType, true), | StructField("COLUMN14", StringType, true), | StructField("COLUMN15", StringType, true), | StructField("COLUMN16", StringType, true), | StructField("COLUMN17", StringType, true), | StructField("COLUMN18", StringType, true), | StructField("COLUMN19", StringType, true), | StructField("COLUMN20", StringType, true), | StructField("COLUMN21", StringType, true), | StructField("COLUMN22", StringType, true))) loandepoSchema: org.apache.spark.sql.types.StructType = StructType(StructField(COLUMN1,StringType,true), StructField(COLUMN2,StringType,true), StructField(COLUMN3,StringType,true), StructField(COLUMN4,StringType,true), StructField(COLUMN5,StringType,true), StructField(COLUMN6,StringType,true), StructField(COLUMN7,IntegerType,true), StructField(COLUMN8,IntegerType,true), StructField(COLUMN9,StringType,true), StructField(COLUMN10,IntegerType,true), StructField(COLUMN11,IntegerType,true), StructField(COLUMN12,IntegerType,true), StructField(COLUMN13,StringType,true), StructField(COLUMN14,StringType,true), StructField(COLUMN15,StringType,true), StructField(COLUMN16,StringType,true), StructField(COLUMN17,StringType,true), StructField(COLUMN18,StringType,true), StructField(COLUMN19,StringType,... scala> val lonadepodf = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").schema(loandepoSchema).load("/tmp/TestDivya/loandepo_10K.csv") lonadepodf: org.apache.spark.sql.DataFrame = [COLUMN1: string, COLUMN2: string, COLUMN3: string, COLUMN4: string, COLUMN5: string, COLUMN6: string, COLUMN7: int, COLUMN8: int, COLUMN9: string, COLUMN10: int, COLUMN11: int, COLUMN12: int, COLUMN13: string, COLUMN14: string, COLUMN15: string, COLUMN16: string, COLUMN17: string, COLUMN18: string, COLUMN19: string, COLUMN20: string, COLUMN21: string, COLUMN22: string] scala> lonadepodf.select("COLUMN1").show(10) 15/12/28 03:39:48 INFO BlockManagerInfo: Removed broadcast_8_piece0 on 172.31.20.85:40013 in memory (size: 4.2 KB, free: 265.3 MB) 15/12/28 03:39:49 INFO YarnScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool 15/12/28 03:39:49 INFO DAGScheduler: Job 6 finished: show at <console>:33, took 0.223277 s +-------+ |COLUMN1| +-------+ | CTR0| | CTR1| | CTR2| | CTR3| | CTR4| | CTR5| | CTR6| | CTR7| | CTR8| | CTR9| +-------+

DivyaGehlot13 · ‎12-24-2015

Hi, I am getting error while starting the Zeppelin service through Ambari Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/ZEPPELIN/package/scripts/master.py", line 295, in <module> Master().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 216, in execute method(env) File "/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/ZEPPELIN/package/scripts/master.py", line 230, in start Execute (params.zeppelin_dir+'/bin/zeppelin-daemon.sh start >> ' + params.zeppelin_log_file, user=params.zeppelin_user) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 260, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 290, in _call err_msg = Logger.filter_text(("Execution of '%s' returned %d. %s") % (command_alias, code, all_output)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128) Any pointer/guidance would be really appreaciated. Thanks

DivyaGehlot13 · ‎12-22-2015

@Neeraj Sabharwal We are currently using Amazon EC2 API tools to start and stop the instances.and able to start the ambari server too. But the issue now is in core-site.xml, In its still showing the old public URL for hive services. How can we make this changes happen automatically.

DivyaGehlot13 · ‎12-22-2015

Hi, I have HDP 2.3.2 cluster set up on Amazon Ec2 on RHEL 7.x . We are stopping the Amazon cluster when not in used basically after work hours. Can somebody share their experience, how can we start Ambari services using Amazon EC2 API or any other means. Any other best practices which needs to be followed. Would really appreciate your help. Thanks

DivyaGehlot13 · ‎12-21-2015

@ashwin jayrama This link may help you

DivyaGehlot13 · ‎12-21-2015

@Neeraj Sabharwal In this above resolution which you have mentioned .. every time I have to change the permission .. Is there any settings which I can change , and give permission to the hive user for newly created hdfs files. for instance: selectedData.write.format("orc").option("header","true").save("/tmp/newcars_orc_cust17") Thanks

DivyaGehlot13 · ‎12-18-2015

@Neeraj Sabharwal I encountered the issue I had enabled Bridge network connection in my VMWare because of which it was not installing the spark-csv packages and I was getting (java.net.ConnectException: Connection refused) .

DivyaGehlot13 · ‎12-18-2015

@vshukla I am logging in as hdfs user on HDP 2.3.2 sandbox and using the same account to see tables in hive.Yes , I am using hive CLI and even browsed HDFS files through Ambari .Couldnt see any tables created.

DivyaGehlot13 · ‎12-18-2015

Hi , I am getting permission denied error when creating external table in hive context. FYI : logged in as hdfs user import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.orc._ val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val df = hiveContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/cars.csv") val selectedData = df.select("year", "model") selectedData.write.format("orc").option("header", "true").save("/tmp/newcars_orc_cust17") hiveContext.sql ("create external table newcars_orc_ext_cust17(year string,model string) stored as orc location '/tmp/newcars_orc_cust17'") org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/tmp/newcars_orc_cust17":hdfs:hdfs:drwxr-xr-x

Online	Offline
Last Visited	‎06-19-2015 01:01 AM

Member Since	‎06-18-2015 09:34 PM
Last Visited	‎06-19-2015 01:01 AM
Posts	55
Kudos received	34

Cloudera Community

Re: [Error]:Accessing hbase table with Spark's Hi...

Re: returns empty result set when using TimestampT...

Re: returns empty result set when using TimestampT...

returns empty result set when using TimestampType ...

Zeppelin starting error in HDP 2.3.2 on Amazon Ec2

Re: Best Practices -how to stop and start Ambari s...

Best Practices -how to stop and start Ambari servi...

Re: Cut and paste not working

Re: permission denied user hive when creating exte...

Re: Spark-csv support in HDP 2.3.2

Re: How do I create an ORC Hive table from Spark?

permission denied user hive when creating external...