Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark streaming in Python - Save output in Hive

Highlighted

Spark streaming in Python - Save output in Hive

Expert Contributor

hi Team,

I am trying to stream log files from Kafka cosumer to hive using Spark in python. It is throwing me below error

8/02/12 20:41:18 WARN AbstractLifeCycle: FAILED SelectChannelConnector@x.x.x.x:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use

8/02/12 20:41:18 WARN AbstractLifeCycle: FAILED org.spark-project.jetty.server.Server@xxxxxx: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)

File "/sourcefiles/streamingspkf.py", line 16, in <module>
b3 = hc.createDataFrame(b2)
File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/context.py", line 425, in createDataFrame
File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/context.py", line 338, in _createFromLocal
TypeError: 'TransformedDStream' object is not iterable
18/02/12 20:41:20 INFO SparkContext: Invoking stop() from shutdown hook

Any idea what is leading to this.

Please let me know if you need any further information.

1 REPLY 1

Re: Spark streaming in Python - Save output in Hive

Expert Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here