About mridley

mridley · ‎05-14-2021

Hello, You can solve this by using the Maven shade plugin. Take a look at the Cloudera doc https://docs.cloudera.com/runtime/7.2.9/developing-spark-applications/topics/spark-packaging-different-versions-of-libraries-with-an-application.html . Michael

mridley · ‎07-23-2020

The data node question has been answered, but one tangental comment - you say you are using the Secondary Name Node service. You almost certainly do not want to be using that. You do not get any HA with the SNN. What you probably want is the Standby Namenode. In Cloudera Manager you can enable HA from the HDFS service actions and that will replace your Secondary Name Node with a Standby Name Node.

mridley · ‎01-27-2020

Can you use sqoop to retrieve the data directly from the database and dump it into Hive? That will solve your delimiter problem.

mridley · ‎01-26-2020

Where is the data coming from? You could use a binary format like Avro or Parquet if your source system can export that way. If you MUST have a text file with a delimiter, you need a delimiter that is not anywhere in the data.

mridley · ‎01-24-2020

I would recommend not using CSV in your case. If you have commas in the fields then you can't really delimit them with commas because, as you have noticed, you will have field breaks in the middle of a field. Can you get the source data exported some other way?

Online	Offline
Last Visited	‎11-30-2021 12:45 PM

Member Since	‎07-30-2013 01:10 PM
Last Visited	‎11-30-2021 12:45 PM
Posts	15
Kudos received	3

Cloudera Community

Re: Spark - separating dependecies of spark and ap...

Re: how to read csv have comma in cell and new lin...

Re: Spark - separating dependecies of spark and ap...

Re: Install DataNode Role on NameNode/SNN machine

Re: how to read csv have comma in cell and new lin...

Re: how to read csv have comma in cell and new lin...

Re: how to read csv have comma in cell and new lin...