Member since
03-27-2016
47
Posts
1
Kudos Received
0
Solutions
10-08-2019
12:44 AM
I have a requirement to process and store huge streaming data. The streams of 3 TB will come every hour and i have to store the data for 15-20 days for historical analysis(i.e 1 PB of data for analysis). I am looking for the suitable NoSQL database which can handle this. Also, the DB should support multiple indexes.
... View more
01-19-2019
04:21 PM
I am trying to figure out a way if possible to execute spark scala code from intellij on HDP 3 sandbox.
... View more
Labels:
- Labels:
-
Apache Spark
07-24-2017
06:16 PM
Zaratsian i followed your tutorial but getting error "Wrong FS". can you help me solve this issue. I have posted the question at below link https://community.hortonworks.com/questions/114572/getting-error-while-reading-hbase-snapshot-through.html
... View more
07-23-2017
06:45 PM
i am getting error while reading hbase snapshot through spark(scala). Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hbase/d9d1b0a7-439d-46a8-984e-886705b5f9b7/data/default/test/fa6fe29976fca3f9be223db554ab22b4, expected: file:/// what parameter should be passed in path() I passed /user/hbase after creating it in hdfs val path = new Path("hdfs://localhost:9000/user/hbase")
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
06-11-2016
02:36 PM
I have created a kafka producer --
from kafka import KafkaProducer
import json,time
userdata={
"ipaddress": "172.16.0.57",
"logtype": "",
"mid": "",
"newsession": "4917279149950184029a78e4a-e694-438f-b994-39897e346953",
"previousurl": "/",
"searchtext": "",
"sessionid": "29a78e4a-e694-438f-b994-39897e346953",
"source": "desktop",
"uid": "Chrome4929a78e4a-e694-438f-b994-39897e346953",
"url": "http://172.16.0.57/",
"useragent": "Mozilla/5.0%20(Windows%20NT%2010.0",
"utmsocial": "null",
"utmsource": "null",
"createdtime": "2016-05-03 12:27:38",
"latency": 13260.0,
"serviceurl": "http://localhost:8080/Business-Web/services/product/getBestDealNew",
"domainlayeripaddress": "localhost",
"name":"TJ"
}
producer = KafkaProducer(bootstrap_servers=['172.16.10.13:6667','172.16.10.14:6667'],value_serializer=lambda v: json.dumps(v).encode('utf-8'))
for i in range(10):
print("adding",i)
producer.send('event', userdata)
#if i < 10:
# producer.send('event', '\n')
time.sleep(3)
And python code to consume json data from kafka . I run this python code like. spark-submit --jars /usr/hdp/2.3.4.7-4/spark/lib/spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,/usr/hdp/2.3.4.7-4/spark/lib/spark-streaming-kafka-assembly_2.10-1.6.1.jar /home/hadoop/tajinder/clickstream_streaming.py from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
sc = SparkContext(appName="Clickstream_kafka")
stream = StreamingContext(sc, 2)
kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1})
parsed = kafka_stream.map(lambda (k, v): json.loads(v))
parsed.pprint()
stream.start()
stream.awaitTermination()
I am able to recieve json data in spark from kafka, but how to convert it to RDD or as table(schema RDD) in pyspark so that RDD operations can be applied on it?
... View more
06-10-2016
10:18 AM
I am trying to fetch json format data from kafka through spark streaming and want to create a temp table in spark to query json data like normal table. i tried several tutorials available on internet but did'nt get success. I am able to read a text file from hdfs and process it through spark, but stuck using json data from kafka. can somebody guide me on this.
... View more
Labels:
- Labels:
-
Apache Spark
06-09-2016
04:23 PM
The spark job ran fine now. I used spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar <file.py>
... View more
06-09-2016
02:39 PM
i am only running like this - spark-submit <file_name.py>
... View more
06-09-2016
11:33 AM
getting error while submitting spark job from command line Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.5.2 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.5.2.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ... the python code i am running is: from pyspark.sql import SQLContext
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
sc = SparkContext(appName="Clickstream_kafka")
stream = StreamingContext(sc, 2)
kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1})
parsed = kafka_stream.map(lambda (k, v): json.loads(v))
print(parsed.collect())
stream.start()
stream.awaitTermination()
... View more
Labels:
- Labels:
-
Apache Spark
06-01-2016
01:13 PM
I am running a query which runs 52 map jobs simultaneously. Due to this my Resource manager container gets filled up completely and consumed up 100%. The query stucks at that point and giving no result. I want to reduce number of map tasks which runs in parallel.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-30-2016
08:55 AM
Thanks kuldeep i am able to run hive queries by putting in file now.
... View more
05-28-2016
05:53 PM
1 Kudo
I am able to run hive query through its shell but not able to run it by putting it into file. It gives me permission denied. I tried to run it through hdfs user but still getting same error.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-28-2016
01:15 PM
can you tell me the recommended setting for my cluster. I have 3 nodes each dual core. 1 node with 12 GB RAM and other two with 6 GB RAM
... View more
05-28-2016
11:16 AM
I have ran into an issue. I am getting hive prompt and also running basic hive queries which did'nt execute MR job at backend. but when i ran query which execute MR job at backend it hang up with no further progress(no mapper/reducer progress). I have checked REsource manager queue, it looks ok as the container is allocated to the query only. Also i have checked my MapReduce2 is up and running. can anybody suggest what needs to be done in this case?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-28-2016
09:45 AM
Hey guys i have ran into another issue. Now i am getting hive prompt and also running basic hive queries which did'nt execute MR job at backend. but when i ran query which execute MR job at backend it hang up with no further progress(no mapper/reducer progress). I have checked REsource manager queue, it looks ok as the container is allocated to the query only. Also i have checked my MapReduce2 is up and running. can anybody suggest what needs to be done in this case?
... View more
05-26-2016
05:22 PM
how to check what all jobs are running in my default queue for resource manager and how to flush them. I need to do this through command line
... View more
05-26-2016
11:26 AM
getting this when hit hive WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.7-4/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.7-4/hive/lib/avro-tools-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] After that long wait and no further progress/error/warning. No idea what need to be done
... View more
Labels:
- Labels:
-
Apache Hive