Member since
09-25-2016
11
Posts
0
Kudos Received
0
Solutions
07-09-2018
09:26 AM
Hi, I have a remote server and Kerberos authenticated Hadoop environment. I want to copy files from Remote server to HDFS for processing using Spark. Please advise efficient approach/HDFS command to copy files from remote server to HDFS. Any example will be helpful. We are bound by not to use flume or Nifi. Please note Kerberos is installed on Remote server.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
06-01-2018
10:50 AM
In case code is not readable, I have uploaded same on https://stackoverflow.com/questions/50606346/iterating-through-nested-element-in-spark?noredirect=1#comment88222431_50606346
... View more
05-31-2018
02:56 PM
I have a dataframe with following schema :- scala> final_df.printSchema
root
|-- mstr_prov_id: string (nullable =true)|-- prov_ctgry_cd: string (nullable =true)|-- prov_orgnl_efctv_dt: timestamp (nullable =true)|-- prov_trmntn_dt: timestamp (nullable =true)|-- prov_trmntn_rsn_cd: string (nullable =true)|-- npi_rqrd_ind: string (nullable =true)|-- prov_stts_aray_txt: array (nullable =true)||-- element: struct (containsNull =true)|||-- PROV_STTS_KEY: string (nullable =true)|||-- PROV_STTS_EFCTV_DT: timestamp (nullable =true)|||-- PROV_STTS_CD: string (nullable =true)|||-- PROV_STTS_TRMNTN_DT: timestamp (nullable =true)|||-- PROV_STTS_TRMNTN_RSN_CD: string (nullable =true)
I am running following code to do basic cleansing but its not working inside "prov_stts_aray_txt" , basically its not going inside array type and performing transformation desire. I want to iterate through out nested all fields(Flat and nested field within Dataframe and perform basic transformation. for(dt <- final_df.dtypes){
final_df = final_df.withColumn(dt._1,when(upper(trim(col(dt._1)))==="NULL",lit(" ")).otherwise(col(dt._1)))}
please help. Please note it's just sample DF actual DF holds multiple array struct type with different number of field in it. Hence which I need to create is in dynamic fashion. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
04-13-2018
09:24 AM
Hi, We had logic in which computated file from hdfs path /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/_SUCCESS was moving to /bigdatahdfs/datalake/publish/prm2(external partitioned Parque table is built on top of it) , it was working fine but after recent migration to new server where encryption is enabled, its throwing series of error messages :- [INFO] :2018-04-12 10:24:01:Wrapper:Job_name:step001_CDC: Moving Files from /bigdatahdfs/datalake/publish/prm2/merchant_location_extension to /bigdatahdfs/datalake/publish/prm2/archive/merchant_location_extension/20180405
mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/_SUCCESS can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2.
mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/part-00000-m-00000.snappy.parquet can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2.
mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/part-00001-m-00001.snappy.parquet can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2. What all are steps Admin team needs to do , so that user will get privilege to move file to target HDFS directories. As a developer, I am not able to get what configuration is missing.
... View more
Labels:
03-26-2018
07:46 AM
Hi, 1- I have confusion between difference between --driver-class-path
--driver-library-path.. Please help me in understanding difference between these two. 2- I am bit new to scala. can you please help in understanding difference between class path and library path. At end, both requires jar path to be set. 3- If i add extra dependencies with --jar option, then do i need to separately project jar path with driver-class-path and spark.executor.executorClassPath
... View more
Labels:
- Labels:
-
Apache Spark