I am trying to calculate median on latitude column for given (destinationid and LocationID) column Data looks like: DESTINATION_ID,LOCATION_ID,LATITUDE
[ENSG00000133895,NORTH_562,0.07056000000000001] Code i tried: var ds = sqlContext.sql("""
SELECT DESTINATION_ID,LOCATION_ID, avg(LATITUDE) as median
FROM ( SELECT DESTINATION_ID,LOCATION_ID, LATITUDE, rN, (CASE WHEN cN % 2 = 0 then (cN DIV 2) ELSE (cN DIV 2) + 1 end) as m1, (cN DIV 2) + 1 as m2
SELECT DESTINATION_ID,LOCATION_ID, LATITUDE, row_number() OVER (PARTITION BY DESTINATION_ID,LOCATION_ID ORDER BY LATITUDE ) as rN,
count(LATITUDE) OVER (PARTITION BY DESTINATION_ID,LOCATION_ID ) as cN
WHERE rN BETWEEN m1 and m2
GROUP BY DESTINATION_ID,LOCATION_ID
""") But getting Error as **Exception in thread "main" java.lang.RuntimeException: [3.98] failure: ``)''
expected but identifier DIV found** am i miss something or is there any better way to calculate Median in spark Thanks
... View more
Hello, i am trying to parse complex JSON available in RestAPI in spark using scala, but it is throwing error as Expected collection but got JObject(List((success,JBool(true)), (message,JString()), Is there any way to insert JSON data to spark DataFrame without knowing JSON structure. directly from Restapi I am using Spark with Scala , and it is Maven Project. May someone please help me how to query RestAPI data in JSON file and insert data to Dataframe.
... View more
Hello, i am building a datapipeline which consume data from RESTApi in json format and pushed to Spark Dataframe. Spark Version: 2.4.4 but getting error as df = SQLContext.jsonRDD(rdd) AttributeError: type object 'SQLContext' has no attribute 'jsonRDD' Code : from pyspark import SparkConf,SparkContext from pyspark.sql import SparkSession from urllib import urlopen from pyspark import SQLContext import json spark = SparkSession \ .builder \ .appName("DataCleansing") \ .getOrCreate() def convert_single_object_per_line(json_list): json_string = "" for line in json_list: json_string += json.dumps(line) + "\n" return json_string def parse_dataframe(json_data): r = convert_single_object_per_line(json_data) mylist =  for line in r.splitlines(): mylist.append(line) rdd = spark.sparkContext.parallelize(mylist) df = SQLContext.jsonRDD(rdd) return df url = "https://mylink" response = urlopen(url) data = str(response.read()) json_data = json.loads(data) df = parse_dataframe(json_data) Techie please help me, if there is any other better way to query RestApi and bring data to Spark Dataframe using Pyspark. If it is not possible in pyspark, can we do it in scala .... Please share your valuable suggestion
... View more