Member since
02-05-2020
6
Posts
0
Kudos Received
0
Solutions
05-05-2020
09:21 AM
Hi,
Getting below error when trying to run a spark streaming job to read from Kafka to HBase.Not getting any error when I am creating a jar file but getting error when I am running spark submit command.
Any help is much appreciated.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Put
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Kafka
-
Apache Spark
02-25-2020
07:36 PM
Hi, Thanks for reply. but it will spend time in writing dataframe into disk. I am looking for an option like is there a way dataframe counts can be displayed into logs, with spark logger option. intention is to avoid re-running dataframe.
... View more
02-19-2020
11:23 AM
Hi,
How do I compute dataframe record count without re-running dataframe. I mean can we pull this information from any spark stats table?
Few options I am aware of are:
1. dataframe.cache() -- Don't want to store result in memory.
2. dataframe.describe("col").show -- again it will re-run the dataframe to get count.
3. dataframe.count().show() -- again it will re-run the dataframe to get count.
Thanks!
Thanks, Waseem
... View more
Labels:
- Labels:
-
Apache Spark
02-05-2020
06:15 AM
sudo su - ${user} -c "beeline -u 'jdbc:hive2://server:port,connectionstring' -e 'insert statement'; "
above line works fine if I take out sudo, with sudo it doesnt work, any help or advice is much appreciated.
Thanks!
... View more
Labels:
- Labels:
-
Apache Hive