About sreeviswa_athic

sreeviswa_athic · ‎01-30-2018

@Ramya Jayathirtha it worked thank you for your timely response

sreeviswa_athic · ‎01-30-2018

Thank you @Ramya Jayathirtha I downloaded jar file from maven and executed,it went through this. Now its blocked different error File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/sourcefiles/spkf.py", line 9, in <lambda> parsed = kafkaStream.map(lambda v: json.loads(v[1])) File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

sreeviswa_athic · ‎01-30-2018

Hi Team, I am trying to copy streaming data from Kafka topic to HDFS directory. It is throwing an error 'java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper' Any kind of assistance to help me resolve is much appreciated. Here are the steps I followed Step 1 - Created topic -> topicXYZ STep 2 - created producer and linked to topicXYZ Step 3 - created consumer and linked to topicXYZ => pyspark program to stream and copy data to HDFS directory from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import json sc = SparkContext(appName="kafksteststreaming") sc.setLogLevel("WARN") ssc = StreamingContext(sc, 60) kafkaStream = KafkaUtils.createStream(ssc, 'xxxx:2181', 'raw-event-streaming-consumer', {'topicXYZ':1})parsed = kafkaStream.map(lambda (k, v): json.loads(v)) parsed.saveAsTextFiles('/tmp/folderxyz') ssc.start() ssc.awaitTermination() spark-submit --jars /usr/hdp/current/spark-client/lib/spark-assembly-*.jar spkf.py The above code is throwing error Spark Streaming's Kafka libraries not found in class path. Try one of the following. Include the Kafka library and its dependencies with in the spark-submit command $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.6 Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.4.0. Then, include the jar in the spark-submit command as

sreeviswa_athic · ‎11-09-2017

Hi Team, I have a requirement to read an existing hive table, massage few columns and overwrite back the same hive table. Below is the code lp=hc.sql('select * from logistics_prd') adt=hc.sql('select * from senty_audit.maintable') cmb_data=adt.unionAll(lp) cdc_data=cmb_data.distinct() cdc_data.write.mode('overwrite').saveAsTable('senty_audit.temptable') In step 2 I am reading senty_audit.maintable from hive. Then I am joining with other dataframe, in last step I am trying to load back(OVERWRITE) to same hive table. In this case spark is throwing an error 'org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from'. Can you please help me understand how should I proceed in this scenario.

sreeviswa_athic · ‎11-01-2017

Hi team, I am looking to convert a unix timestamp field to human readable format. Can some one help me in this. I am using from unix_timestamp('Timestamp', "yyyy-MM-ddThh:mm:ss"), but this is not working. Any suggestions would be of great help

sreeviswa_athic · ‎10-25-2017

@Aditya Sirna thank you for your response. I have tried that too, this way it is still creating a folder with name 'finename.txt' and one single file 'part-00000' file in the folder. Kindly let me know if there is a way to create a file directly on specified path

sreeviswa_athic · ‎10-25-2017

Hi Team, I am using python spark 1.6 trying to save RDD to hdfs path, when I am giving command rdd.saveAsTextFile('/path/finename.txt') it is saving directory and files in it. It is creating directory in specified path instead of creating file Can you suggest a way to create a file in hdfs specified path instead of directory and files in it.

sreeviswa_athic · ‎10-19-2017

rdd.sortByKey() sorts in ascending order. I want to sort in descending order. I tried rdd.sortByKey("desc") but it did not work

sreeviswa_athic · ‎10-13-2017

@Dinesh Chitlangia Thank you for explanation. In that case i would rather use reducByKey() to get the number of occurence. thanks for the info on CountByValue()

sreeviswa_athic · ‎10-13-2017

Hi Team, I am working on to get count out number of occurence in a file. RDD1 = RDD.flatMap(lambda x:x.split('|')) RDD2 = RDD1.countByValue() I want to save the output of RDD2 to textfile. i am able to see output by for x,y in RDD2.items():print(x,y) but when tried to save to textfile using RDD2.saveAsTextFile(\path) it is not working. it was throwing as 'AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile'' Can you please help me understanding if i am missing something here. Or how to save countByValue to text file

Online	Offline
Last Visited	‎04-04-2020 07:07 PM

Member Since	‎02-25-2016 11:18 PM
Last Visited	‎04-04-2020 07:07 PM
Posts	72
Kudos received	34

Cloudera Community

Re: Hive query execution taking longer time

Re: HBase/Phoenix - How to specify autocommit in J...

Re: Hive 1.2.x - CTAS behavior when using CAST to ...

Re: Oozie > HiveActionExecutor > LauncherMapper di...

Re: How to apply configuration when creating more ...

Re: Kakfa Spark Streaming Error

Re: Kakfa Spark Streaming Error

Kakfa Spark Streaming Error

Spark read and overwrtite hive table

pyspark convert unixtimestamp to datetime

Re: pyspark - creating directory when trying to RD...

pyspark - creating directory when trying to RDD as...

Spark sort by key with descending order

Re: not able to save countByValue() RDD to textFil...

not able to save countByValue() RDD to textFile - ...