Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark code - Scala - While writing data to CSV, field's format is getting changed.

Explorer

We have created spark application for client reporting . Client wants report in CSV format. We have coded it that way and it is generating desired output with requested format. When we see result data in log it shows correct format and correct data(I.e date format requested is 2018-07-26 11:19:04.0 and it is correct format shows in log but when we see same data in CSV file format is getting changed. It shows 6/7/2018 12:27 format. Why this issue with csv file when we see log file it shows correct results and same we have written to csv file through flie write command, it shows fomat changed. How to resolve this?
Sample Code:

val selectedData = dataFrame3.select(concat(col("ticket_number"),lit("-"),date_format(col("as_of_date"),"yyMMdd")).as("transref"),col("newmCanc").as("newmCanc"),
when(col("trade_action") === "CXL",concat(col("master_ticket_num"),lit("-"),date_format(col("as_of_date"),"yyMMdd"))).otherwise("").as("relTransref"), col("trader_name").as("portfolioIdAm"), col("portfolioIdKvg"),
col("name").as("portfolioName"), when(col("buy_sell_desc") === "Buy", "BUY").when(col("buy_sell_desc") === "Sell", "SELL").otherwise("OTHER").as("buyisell"),
col("trade_feed_trade_amount").as("quantity"),
col("secIdType"), col("id_isin").as("secId"), when(col("instrument_name").isNotNull,col("instrument_name")).otherwise(col("security_name")).as("secName"),
format_number(col("trade_price").cast("Double"),2).as("price"), col("currency").as("tradeCCY"),
format_number(col("settlement_costs_in_settlement_currency").cast("Double"),2).as("tradeComm"), format_number(col("Transaction_Cost_2_Amount").cast("Double"),2).as("fees"),
format_number(col("Transaction_Cost_3_Amount").cast("Double"),2).as("tax"),
format_number(col("Transaction_Cost_5_Amount").cast("Double"),2).as("others"), format_number(col("Accrued_Interest"),2).as("interest"),
format_number(col("settlement_total_in_settlement_currency").cast("Double"),2).as("settlAmount"),
when(col("Number_of_days_accrued_interest").isNull, "0").otherwise(col("Number_of_days_accrued_interest")).as("interestDays"),
date_format(col("as_of_date"),"yyyy-MM-dd").cast("String").as("tradeDate"),
date_format(col("receiveddate"),"yyyy-MM-dd HH:mm:ss").cast("String").as("executionTimestamp"),
date_format(col("settlement_date"), "yyyy-MM-dd").cast("String").as("settlementDate")

2 REPLIES 2

Explorer

can somebody please help? I am completely stuck

@HDave I see you are casting dates as string so by looking at this code is hard to say why this is happening.
In order to help you could you post simplified version of the code that reproduces the problem? Including which HDP version you are running? This way we will understand not only how dataframe was populated but also how you are saving it.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.