Member since
11-22-2016
50
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1728 | 01-17-2017 02:54 PM |
10-05-2017
07:06 AM
Yes, metadata does store the columns name in its database. hive > show columns in table_name: hive> set hive.cli.print.header=true; To view the column names of your table.
... View more
10-05-2017
06:48 AM
how did you run the script? it didn't return me any result. can your share your's?
... View more
08-21-2017
09:53 AM
But this is not a suitable solution for production environment
... View more
04-04-2017
01:50 AM
How did u solved it ??? Which things one has to check ?
... View more
04-04-2017
01:48 AM
how did you solve it Max?
... View more
01-17-2017
02:54 PM
fixed it, like below df.withColumn("Timestamp_val",lit(current_timestamp)) As the second argument in the .withColumn() will expect a named column and val newDF=dataframe.withColumn("Timestamp_val",current_timestamp()) will not generate a named column.Hence the exception
... View more
01-17-2017
12:19 PM
Hi all, Here i'm trying to add time stamp to the data frame dynamically, like this messages.foreachRDD(rdd=>
74 {
75 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
76 import sqlContext.implicits._
77 val dataframe =sqlContext.read.json(rdd.map(_._2)).toDF()
78 import org.apache.spark.sql.functions._
79 val newDF=dataframe.withColumn("Timestamp_val",current_timestamp())
80 newDF.show()
81 newDF.printSchema() But this code is giving me an headache, sometimes it is printing the schema and sometimes it is throwing this java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$14.apply(Analyzer.scala:354) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$14.apply(Analyzer.scala:353) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10.applyOrElse(Analyzer.scala:353) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10.applyOrElse(Analyzer.scala:347) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:347) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:328) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$withPlan(DataFrame.scala:2126) at org.apache.spark.sql.DataFrame.select(DataFrame.scala:707) at org.apache.spark.sql.DataFrame.withColumn(DataFrame.scala:1188) at HiveGenerator$$anonfun$main$1.apply(HiveGenerator.scala:79) at HiveGenerator$$anonfun$main$1.apply(HiveGenerator.scala:73) Where am i going wrong, please help.
... View more
Labels:
- Labels:
-
Apache Spark
01-16-2017
09:02 PM
Can i know which versions of hive and spark you are using?
... View more
01-14-2017
09:40 PM
which version spark are you using? assuming you are using 1.4v or higher. import org.apache.spark.sql.hive.HiveContext import sqlContext.implicits._ val hiveObj = new HiveContext(sc) hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables. val sample = sqlContext.sql("select * from table").collect() sample.foreach(println) This has worked for me
... View more
01-14-2017
09:33 PM
Did this work for you? If not, please post the code which worked for you
... View more
01-11-2017
05:12 AM
@Steven O'Neill i'm not touching the data in the HDFS. it has created a seperate folder for the date column on using alter statement.
... View more
01-11-2017
02:58 AM
@DWinters can you elaborate the way you have solved it, cause i just ran in to this issue. I have 11 columns in data frame(added a timestamp column to 10 columned RDD). And i have a hive table with complete 11 columns one is partitioned by timestamp.
... View more
01-10-2017
09:15 AM
@Neeraj Sabharwal
Hi all,
Im trying to create an external hive partitioned table which location points to an HDFS location.This HDFS location get appended every time i run my spark streaming application, so my hive table appends too. Kafka >> Spark Streaming >> HDFS >> Hive External Table. I could do the above flow smoothly with a non partitioned table, but when i want to add partitions to my external table i'm not able to get data in to my hive external table whatsoever. I have tried the below after creating the external table, but still the problem persists. <code>CREATE EXTERNAL TABLE user (
userId BIGINT,
type INT,
level TINYINT,
date String
)
PARTITIONED BY (date String)
LOCATION '/external_table_path'; <code>ALTER TABLE user ADD PARTITION(date='2010-02-22');
I tried the above fix but the things haven't changed. What is the best work around to get my appending HDFS data in to an external Partition table??
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
01-04-2017
11:51 AM
you can add the spark assembly jar in the global location like hdfs:/// . And set the spark.yarn.jar value in spark-defaults.conf to that spark assembly jar in hdfs path.
... View more
01-04-2017
11:47 AM
Can you check your, spark-env.sh file. Make sure you set the HADOOP_CONF_DIR and JAVA_HOME in the spark-env.sh too
... View more
01-03-2017
11:52 PM
i was pointinn to a wrong broker so the zk was unable to find the topics
... View more
01-03-2017
11:47 PM
1 Kudo
Still the issue is persisting, What else can we do to make it work other than hive-site.xml
... View more
12-29-2016
02:26 AM
i'm using sbt, should i use spark-submit everytime we need to run a project? SBT run, is catering my needs for now, as im using it in local mode.
... View more
12-29-2016
01:14 AM
i'm running my spark job using sbt,i.e., "sbt run". You think this can be a problem? Or should we stick to the spark-submit all the time?
... View more
12-29-2016
01:10 AM
i'm running my spark job using sbt,i.e., "sbt run". You think this can be a problem? Or should we stick to the spark-submit all the time?
... View more
11-23-2016
05:25 AM
hi, I found the mistake . Thanks for being there for a fellow programmer
... View more
11-22-2016
02:34 AM
i came across this problem too. I have zookeeper and kafka in seperate hosts. i ran these commands: bin/kafka-server-start.sh config/server.properties bin/kafka-topics.sh --create --zookeeper server1:2181 --replication-factor 1 --partitions 1 --topic mobile_events This has created topics in the kafka dir. bin/kafka-console-producer.sh --broker-list server1:9092 --topic mobile_events. This opens a terminal to type in the messages, but the messages are not being saved in the topics as i cheked the kafka data dir. Finally, bin/kafka-console-consumer.sh --zookeeper server1:2181 --topic mobile_events --from-beginning Is throwing "No brokers fround in the ZK" log to the console. Can you help me with fixing the issue. Thanks in advance.
... View more
06-09-2016
04:32 PM
1 Kudo
where can i change the ip to internal ip??
... View more
06-07-2016
05:35 AM
Your suggestion helped me to over come a fortnight problem .Thanks a ton !!!
... View more
06-07-2016
05:21 AM
This comment fixed my problem..You are Awsome vikas
... View more
05-24-2016
07:41 PM
@All.. Thanks for your Points.
... View more
05-24-2016
06:43 PM
Typo, corrected!!
... View more
05-24-2016
05:00 PM
Below are my OS and other details on my multi-node cluster. Can anyone suggest, if I want to be conflict free between HDP & HDF & NIFI in the future, 1) Please suggest on OS? Currently HDP is installed on RHEL6. 2) Please suggest on the versions? Currently installed HDP 2.2.x and Ambari 2.1.x. 3) What version of HDF and NIFI need to be installed? Please point me in the right direction to continue my IOT idea.
... View more
Labels: