About Vinitkumar

Vinitkumar · ‎07-09-2018

Hi, I have a remote server and Kerberos authenticated Hadoop environment. I want to copy files from Remote server to HDFS for processing using Spark. Please advise efficient approach/HDFS command to copy files from remote server to HDFS. Any example will be helpful. We are bound by not to use flume or Nifi. Please note Kerberos is installed on Remote server.

Vinitkumar · ‎06-01-2018

In case code is not readable, I have uploaded same on https://stackoverflow.com/questions/50606346/iterating-through-nested-element-in-spark?noredirect=1#comment88222431_50606346

Vinitkumar · ‎05-31-2018

I have a dataframe with following schema :- scala> final_df.printSchema root |-- mstr_prov_id: string (nullable =true)|-- prov_ctgry_cd: string (nullable =true)|-- prov_orgnl_efctv_dt: timestamp (nullable =true)|-- prov_trmntn_dt: timestamp (nullable =true)|-- prov_trmntn_rsn_cd: string (nullable =true)|-- npi_rqrd_ind: string (nullable =true)|-- prov_stts_aray_txt: array (nullable =true)||-- element: struct (containsNull =true)|||-- PROV_STTS_KEY: string (nullable =true)|||-- PROV_STTS_EFCTV_DT: timestamp (nullable =true)|||-- PROV_STTS_CD: string (nullable =true)|||-- PROV_STTS_TRMNTN_DT: timestamp (nullable =true)|||-- PROV_STTS_TRMNTN_RSN_CD: string (nullable =true) I am running following code to do basic cleansing but its not working inside "prov_stts_aray_txt" , basically its not going inside array type and performing transformation desire. I want to iterate through out nested all fields(Flat and nested field within Dataframe and perform basic transformation. for(dt <- final_df.dtypes){ final_df = final_df.withColumn(dt._1,when(upper(trim(col(dt._1)))==="NULL",lit(" ")).otherwise(col(dt._1)))} please help. Please note it's just sample DF actual DF holds multiple array struct type with different number of field in it. Hence which I need to create is in dynamic fashion. Thanks

Vinitkumar · ‎04-13-2018

Hi, We had logic in which computated file from hdfs path /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/_SUCCESS was moving to /bigdatahdfs/datalake/publish/prm2(external partitioned Parque table is built on top of it) , it was working fine but after recent migration to new server where encryption is enabled, its throwing series of error messages :- [INFO] :2018-04-12 10:24:01:Wrapper:Job_name:step001_CDC: Moving Files from /bigdatahdfs/datalake/publish/prm2/merchant_location_extension to /bigdatahdfs/datalake/publish/prm2/archive/merchant_location_extension/20180405 mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/_SUCCESS can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2. mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/part-00000-m-00000.snappy.parquet can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2. mv: /bigdatahdfs/datalake/raw/prm2/temp/merchant_location_extension/part-00001-m-00001.snappy.parquet can't be moved from encryption zone /bigdatahdfs/datalake/raw/prm2 to encryption zone /bigdatahdfs/datalake/publish/prm2. What all are steps Admin team needs to do , so that user will get privilege to move file to target HDFS directories. As a developer, I am not able to get what configuration is missing.

Vinitkumar · ‎03-26-2018

Hi, 1- I have confusion between difference between --driver-class-path --driver-library-path.. Please help me in understanding difference between these two. 2- I am bit new to scala. can you please help in understanding difference between class path and library path. At end, both requires jar path to be set. 3- If i add extra dependencies with --jar option, then do i need to separately project jar path with driver-class-path and spark.executor.executorClassPath

Online	Offline
Last Visited	‎10-20-2016 02:55 AM

Member Since	‎09-25-2016 10:27 AM
Last Visited	‎10-20-2016 02:55 AM
Posts	11

Cloudera Community

Coping files from Remote server to HDFS

Re: Iterating through nested fields in spark DF

Iterating through nested fields in spark DF

can't be moved from encryption zone

Spark-submit Options --jar, --spark-driver-classpa...