Member since
02-05-2020
6
Posts
0
Kudos Received
0
Solutions
05-05-2020
09:21 AM
Hi,
Getting below error when trying to run a spark streaming job to read from Kafka to HBase.Not getting any error when I am creating a jar file but getting error when I am running spark submit command.
Any help is much appreciated.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Put
... View more
Labels:
03-12-2020
08:36 AM
Hi,
when I run it in spark-shell, it will execute fine, but when I run the same statement in scala program it error out as "overloaded method value".
Spark-Shell: spark.read.json(dataframe.select("col_name").as[String]).schema
After research I found that in scala we need to pass rdd as an argument instead of dataframe.
is there a way to make Spark-Shell statment to run in scala without converting it as an rdd.
Note: because my dataframe has both straight column values and json columns. above statement takes only json value as input.
Help is much appreciated.
Thanks,
Waseem
... View more
Labels:
02-25-2020
07:36 PM
Hi, Thanks for reply. but it will spend time in writing dataframe into disk. I am looking for an option like is there a way dataframe counts can be displayed into logs, with spark logger option. intention is to avoid re-running dataframe.
... View more
02-19-2020
11:23 AM
Hi,
How do I compute dataframe record count without re-running dataframe. I mean can we pull this information from any spark stats table?
Few options I am aware of are:
1. dataframe.cache() -- Don't want to store result in memory.
2. dataframe.describe("col").show -- again it will re-run the dataframe to get count.
3. dataframe.count().show() -- again it will re-run the dataframe to get count.
Thanks!
Thanks, Waseem
... View more
Labels:
02-05-2020
06:15 AM
sudo su - ${user} -c "beeline -u 'jdbc:hive2://server:port,connectionstring' -e 'insert statement'; "
above line works fine if I take out sudo, with sudo it doesnt work, any help or advice is much appreciated.
Thanks!
... View more
Labels: