- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to read csv have comma in cell and new line character as well
- Labels:
-
Apache Spark
Created on
‎01-24-2020
04:19 AM
- last edited on
‎01-25-2020
02:54 PM
by
ask_bill_brooks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
how to read csv have comma in cell and new line character as well . like i have somme columns with description (hi,how are you).
i am trying to read with spark in scala .
please guide
Created ‎01-27-2020 06:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you use sqoop to retrieve the data directly from the database and dump it into Hive? That will solve your delimiter problem.
Created ‎01-24-2020 06:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would recommend not using CSV in your case. If you have commas in the fields then you can't really delimit them with commas because, as you have noticed, you will have field breaks in the middle of a field.
Can you get the source data exported some other way?
Created ‎01-25-2020 12:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right and Thank You. What format you suggest?
Created ‎01-25-2020 12:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do you think ? does it work if i copy file direct into mysql then replace commas and after read with spark?
Created ‎01-26-2020 07:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Where is the data coming from? You could use a binary format like Avro or Parquet if your source system can export that way. If you MUST have a text file with a delimiter, you need a delimiter that is not anywhere in the data.
Created ‎01-26-2020 10:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it is coming from oracle database
Created ‎01-27-2020 06:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you use sqoop to retrieve the data directly from the database and dump it into Hive? That will solve your delimiter problem.
Created ‎01-27-2020 10:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
great option. Thank You Very Much 🙂
Created ‎01-24-2020 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried using openSerde In DDL?
Create Table tablename(columns)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';
Thanks,
Manoj
Created ‎01-25-2020 12:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
actually, i am getting file from ftp and reading using spark. i didn't try to push directly to hive because some enrichment i have to made. this is a big problem for me.
Thanks
