Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark code - Scala - While writing data to CSV, field's format is getting changed.

Highlighted

Spark code - Scala - While writing data to CSV, field's format is getting changed.

Explorer

We have created spark application for client reporting . Client wants report in CSV format. We have coded it that way and it is generating desired output with requested format. When we see result data in log it shows correct format and correct data(I.e date format requested is 2018-07-26 11:19:04.0 and it is correct format shows in log but when we see same data in CSV file format is getting changed. It shows 6/7/2018 12:27 format. Why this issue with csv file when we see log file it shows correct results and same we have written to csv file through flie write command, it shows fomat changed. How to resolve this?
Sample Code:

val selectedData = dataFrame3.select(concat(col("ticket_number"),lit("-"),date_format(col("as_of_date"),"yyMMdd")).as("transref"),col("newmCanc").as("newmCanc"),
when(col("trade_action") === "CXL",concat(col("master_ticket_num"),lit("-"),date_format(col("as_of_date"),"yyMMdd"))).otherwise("").as("relTransref"), col("trader_name").as("portfolioIdAm"), col("portfolioIdKvg"),
col("name").as("portfolioName"), when(col("buy_sell_desc") === "Buy", "BUY").when(col("buy_sell_desc") === "Sell", "SELL").otherwise("OTHER").as("buyisell"),
col("trade_feed_trade_amount").as("quantity"),
col("secIdType"), col("id_isin").as("secId"), when(col("instrument_name").isNotNull,col("instrument_name")).otherwise(col("security_name")).as("secName"),
format_number(col("trade_price").cast("Double"),2).as("price"), col("currency").as("tradeCCY"),
format_number(col("settlement_costs_in_settlement_currency").cast("Double"),2).as("tradeComm"), format_number(col("Transaction_Cost_2_Amount").cast("Double"),2).as("fees"),
format_number(col("Transaction_Cost_3_Amount").cast("Double"),2).as("tax"),
format_number(col("Transaction_Cost_5_Amount").cast("Double"),2).as("others"), format_number(col("Accrued_Interest"),2).as("interest"),
format_number(col("settlement_total_in_settlement_currency").cast("Double"),2).as("settlAmount"),
when(col("Number_of_days_accrued_interest").isNull, "0").otherwise(col("Number_of_days_accrued_interest")).as("interestDays"),
date_format(col("as_of_date"),"yyyy-MM-dd").cast("String").as("tradeDate"),
date_format(col("receiveddate"),"yyyy-MM-dd HH:mm:ss").cast("String").as("executionTimestamp"),
date_format(col("settlement_date"), "yyyy-MM-dd").cast("String").as("settlementDate")

2 REPLIES 2

Re: Spark code - Scala - While writing data to CSV, field's format is getting changed.

Explorer

can somebody please help? I am completely stuck

Highlighted

Re: Spark code - Scala - While writing data to CSV, field's format is getting changed.

@HDave I see you are casting dates as string so by looking at this code is hard to say why this is happening.
In order to help you could you post simplified version of the code that reproduces the problem? Including which HDP version you are running? This way we will understand not only how dataframe was populated but also how you are saving it.

Don't have an account?
Coming from Hortonworks? Activate your account here