Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to do SPARKSQL query on results of earlier query of registerTempTable?

Highlighted

How to do SPARKSQL query on results of earlier query of registerTempTable?

New Contributor

Hi Team,

I am trying following SQL:-

var work__store_level_vend_pack_loc_final_data =

  sparksession.read.format("csv") .option("header", "true") .option("delimiter", "|") .option("inferSchema", "true") .load("C:\\Users\\jabin\\Desktop\\project_files\\work__store_level_vend_pack_loc_final_data.txt");   work__store_level_vend_pack_loc_final_data.registerTempTable("work__store_level_vend_pack_loc_final_data_table");

var r1 = sparksession.sqlContext.sql( "SELECT shc_item_id ,'K' as source_owner_cd,item_purchase_status_cd, vendor_package_id,vendor_package_purchase_status_cd,flow_type_cd as vendor_package_flow_type_cd,vendor_carton_qty,vendor_stock_nbr,ksn_package_id,ksn_purchase_status_cd,import_ind,sears_divission_nbr,sears_item_nbr,sears_sku_nbr,scan_based_trading_ind,cross_merchandising_cd,retail_carton_vendor_package_id,vendor_package_owner_cd,can_carry_model_id,'' AS days_to_check_begin_day_qty,'' AS days_to_check_end_day_qty ,dotcom_allocation_ind ,retail_carton_internal_package_qty,allocation_replenishment_cd,shc_item_type_cd,idrp_order_method_cd,source_package_qty as store_source_package_qty,order_duns_nbr FROM work__store_level_vend_pack_loc_final_data_table WHERE flow_type_cd = 'JIT' OR servicing_dc_nbr > '0' ")
// .collect.foreach(println)

now i want to distinct all column of the r1 using sparksession.sqlContext.sql("")
how to do above thing?

1 REPLY 1

Re: How to do SPARKSQL query on results of earlier query of registerTempTable?

Super Guru

@Heena Shaikh

You can achieve different ways.

Using DataframeAPI:

var work__store_level_vend_pack_loc_final_data = sparksession.read.format("csv") .option("header", "true") .option("delimiter", "|") .option("inferSchema", "true") .load("C:\\Users\\jabin\\Desktop\\project_files\\work__store_level_vend_pack_loc_final_data.txt");

work__store_level_vend_pack_loc_final_data
.select("*")
.distinct() //distinct on all columns
.registerTempTable("work__store_level_vend_pack_loc_final_data_table") //register temp table

(or)

Using sparkSQL:

var work__store_level_vend_pack_loc_final_data = sparksession.read.format("csv") .option("header", "true") .option("delimiter", "|") .option("inferSchema", "true") .load("C:\\Users\\jabin\\Desktop\\project_files\\work__store_level_vend_pack_loc_final_data.txt");
work__store_level_vend_pack_loc_final_data.registerTempTable("work__store_level_vend_pack_loc_final_data_table");

sparksession.sqlContext.sql("select distinct * from work__store_level_vend_pack_loc_final_data_table").show(20,false) //distinct of all columns