Support Questions

mark_hadoop · ‎02-18-2021

Hi everybody,

I am trying the following approach to write data in to hive table.

import logging
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
from pyspark.streaming.kafka import KafkaUtils
import datetime
from pyspark.sql.functions import lit,unix_timestamp
from os.path import *
from pyspark import Row

warehouseLocation = abspath("spark-warehouse")
spark = SparkSession.builder.appName("spark_streaming").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()

kafka = "kafka"
offsets = "earliest"
servers = "server_1:port,server_2:port"
security_protocol = "SSL"
keystore_location = "keystore"
keystore_password = "keystore_password"
kafka_topic = "kafka_topic"
checkpoint_location ="/checkpoint/location"

def hiveInsert(df, batchId):
        df.createOrReplaceTempView("updates")
        spark.sql("insert into hive_db.hive_table select value, time_stamp from updates")

df = spark.readStream.format(kafka).option("startingoffsets", offsets).option("kafka.bootstrap.servers", servers).option("kafka.security.protocol", security_protocol).option("kafka.ssl.keystore.location", keystore_location).option("kafka.ssl.keystore.password", keystore_password).option("subscribe",kafka_topic).load().selectExpr("CAST(value AS STRING)").select('value').withColumn('time_stamp',lit(datetime.datetime.now().strftime('%Y%m%d%H%M')))

query = df.writeStream.foreachBatch(hiveInsert).start()


query.awaitTermination()

The above code is not working

Any pointers are of great help!

Chandy · ‎09-22-2021

Hi @mark_hadoop , Is this solved ? If yes ,what was the issue and how were you able to fix it ?

Thanks,

Albin

Cloudera Community

Support Questions

pyspark streaming writing data in to hive using foreachbatch method

Writing parquet on HDFS using Spark Streaming

Uploading Files for Cloudera Support - alternate m...

How to use NiFi to write API data to CDP CDW

Hive Streaming Compaction

Streaming/Query data to CDP Public Cloud Using Clo...

Hbase filter query using pyspark

Streaming data from oracle data base logflie and ...

Using Custom Data Connections in Cloudera Machine ...

Create Hive table using pyspark: Mkdirs failed to...

Spark Connect to CDP Warehouse using Hive JDBC Met...