Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to combine two dstream using pyspark?

how to combine two dstream using pyspark?

New Contributor

I have this problem on pyspark which is this:

I should combine two dstream into one dstream but unfortunately I have not received any print.

This is my code:

 

sc = SparkContext(appName="Sparkstreaming")
spark = SparkSession(sc)
sc.setLogLevel("WARN")

ssc = StreamingContext(sc,3)

kafka_stream = KafkaUtils.createStream(ssc,"localhost:2181","consumer-be-borsa",{"Be_borsa":1})
kafka_stream1 = KafkaUtils.createStream(ssc,"localhost:2181","consumer-be-teleborsa",{"Be_teleborsa":1})

dstream = kafka_stream.map(lambda k, v: json.loads(v['id']))
dstream1 = kafka_stream1.map(lambda k, v : json.loads(v['id']))

# Join
streamJoined = dstream.join(dstream1)

streamJoined.pprint()

ssc.start()
time.sleep(100)

ssc.stop()

 

These are the two JSON files that I consume:

 

{"id": "Be_20200330", "Date": "2020-03-30", "Name": "Be", "Hour": "15.49.24", "Last Price": "0,862", "Var%": "-1,93", "Last Value": "1.020"}

{"id": "Be_20200330", "Date": "2020-03-30", "Name": "Be", "Volatility": "2,352", "Value_at_risk": "5,471"}

 

The result I would like is this:

 

{"id": "Be_20200330", "Date": "2020-03-30", "Name": "Be","Hour": "15.49.24", "Last Price": "0,862", "Var%": "-1,93", "Last Value": "1.020", "Volatility": "2,352", "Value_at_risk": "5,471"}

 

How can i do using pyspark?

Don't have an account?
Coming from Hortonworks? Activate your account here