Member since
03-27-2016
47
Posts
1
Kudos Received
0
Solutions
06-11-2016
02:36 PM
I have created a kafka producer --
from kafka import KafkaProducer
import json,time
userdata={
"ipaddress": "172.16.0.57",
"logtype": "",
"mid": "",
"newsession": "4917279149950184029a78e4a-e694-438f-b994-39897e346953",
"previousurl": "/",
"searchtext": "",
"sessionid": "29a78e4a-e694-438f-b994-39897e346953",
"source": "desktop",
"uid": "Chrome4929a78e4a-e694-438f-b994-39897e346953",
"url": "http://172.16.0.57/",
"useragent": "Mozilla/5.0%20(Windows%20NT%2010.0",
"utmsocial": "null",
"utmsource": "null",
"createdtime": "2016-05-03 12:27:38",
"latency": 13260.0,
"serviceurl": "http://localhost:8080/Business-Web/services/product/getBestDealNew",
"domainlayeripaddress": "localhost",
"name":"TJ"
}
producer = KafkaProducer(bootstrap_servers=['172.16.10.13:6667','172.16.10.14:6667'],value_serializer=lambda v: json.dumps(v).encode('utf-8'))
for i in range(10):
print("adding",i)
producer.send('event', userdata)
#if i < 10:
# producer.send('event', '\n')
time.sleep(3)
And python code to consume json data from kafka . I run this python code like. spark-submit --jars /usr/hdp/2.3.4.7-4/spark/lib/spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,/usr/hdp/2.3.4.7-4/spark/lib/spark-streaming-kafka-assembly_2.10-1.6.1.jar /home/hadoop/tajinder/clickstream_streaming.py from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
sc = SparkContext(appName="Clickstream_kafka")
stream = StreamingContext(sc, 2)
kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1})
parsed = kafka_stream.map(lambda (k, v): json.loads(v))
parsed.pprint()
stream.start()
stream.awaitTermination()
I am able to recieve json data in spark from kafka, but how to convert it to RDD or as table(schema RDD) in pyspark so that RDD operations can be applied on it?
... View more
06-10-2016
10:18 AM
I am trying to fetch json format data from kafka through spark streaming and want to create a temp table in spark to query json data like normal table. i tried several tutorials available on internet but did'nt get success. I am able to read a text file from hdfs and process it through spark, but stuck using json data from kafka. can somebody guide me on this.
... View more
Labels:
- Labels:
-
Apache Spark
06-09-2016
04:23 PM
The spark job ran fine now. I used spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar <file.py>
... View more
06-09-2016
02:39 PM
i am only running like this - spark-submit <file_name.py>
... View more
06-09-2016
11:33 AM
getting error while submitting spark job from command line Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.5.2 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.5.2.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ... the python code i am running is: from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
sc = SparkContext(appName="Clickstream_kafka")
stream = StreamingContext(sc, 2)
kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1})
parsed = kafka_stream.map(lambda (k, v): json.loads(v))
print(parsed.collect())
stream.start()
stream.awaitTermination()
... View more
Labels:
- Labels:
-
Apache Spark
06-01-2016
01:13 PM
I am running a query which runs 52 map jobs simultaneously. Due to this my Resource manager container gets filled up completely and consumed up 100%. The query stucks at that point and giving no result. I want to reduce number of map tasks which runs in parallel.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-30-2016
08:55 AM
Thanks kuldeep i am able to run hive queries by putting in file now.
... View more
05-28-2016
05:53 PM
1 Kudo
I am able to run hive query through its shell but not able to run it by putting it into file. It gives me permission denied. I tried to run it through hdfs user but still getting same error.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-28-2016
01:15 PM
can you tell me the recommended setting for my cluster. I have 3 nodes each dual core. 1 node with 12 GB RAM and other two with 6 GB RAM
... View more
05-28-2016
11:16 AM
I have ran into an issue. I am getting hive prompt and also running basic hive queries which did'nt execute MR job at backend. but when i ran query which execute MR job at backend it hang up with no further progress(no mapper/reducer progress). I have checked REsource manager queue, it looks ok as the container is allocated to the query only. Also i have checked my MapReduce2 is up and running. can anybody suggest what needs to be done in this case?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive