<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Conversion of a file(with pipe(|), comma(,) and inverted commas(&amp;quot;)) to avro format in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81330#M44704</link>
    <description>&lt;P&gt;I have a flat file with column names separated by comma(,) and column values spearated by pipe(,) and comma(,).&lt;/P&gt;&lt;P&gt;can someone help how can I convert this file to avro file/format.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;for example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"EMP-CO","EMP-ID" - column names&lt;/P&gt;&lt;P&gt;|ABC|,|123456| - Values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks in advance.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 13:49:18 GMT</pubDate>
    <dc:creator>madankumarpuril</dc:creator>
    <dc:date>2022-09-16T13:49:18Z</dc:date>
    <item>
      <title>Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81330#M44704</link>
      <description>&lt;P&gt;I have a flat file with column names separated by comma(,) and column values spearated by pipe(,) and comma(,).&lt;/P&gt;&lt;P&gt;can someone help how can I convert this file to avro file/format.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;for example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"EMP-CO","EMP-ID" - column names&lt;/P&gt;&lt;P&gt;|ABC|,|123456| - Values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:49:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81330#M44704</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2022-09-16T13:49:18Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81339#M44705</link>
      <description>&lt;P&gt;Should be doable in Spark using the CSV and Avro reader/writer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your header is quite odd with quoting characters surrounding its column names, so it cannot be understood directly ('"' is an illegal character for an avro field name). We could have the Spark CSV reader ignore this line as a comment since no other line should start with a '"' character.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your data is expressed as quoted values with the quoted character being '|'.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Something like the below can achieve a conversion, for CDH5:&lt;/P&gt;&lt;PRE&gt;~&amp;gt; spark-shell --packages com.databricks:spark-csv_2.10:1.5.0,com.databricks:spark-avro_2.10:4.0.0

&amp;gt; import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

&amp;gt; // Manual schema declaration of the 'co' and 'id' column names and types
&amp;gt; val customSchema = StructType(Array(
StructField("co", StringType, true),
StructField("id", IntegerType, true)))

&amp;gt; val df = sqlContext.read.format("com.databricks.spark.csv").option("comment", "\"").option("quote", "|").schema(customSchema).load("/tmp/file.txt")

&amp;gt; df.write.format("com.databricks.spark.avro").save("/tmp/avroout")

&amp;gt; // Note: /tmp/file.txt is input file/dir, and /tmp/avroout is the output dir&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Oct 2018 07:15:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81339#M44705</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2018-10-22T07:15:20Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81385#M44706</link>
      <description>&lt;P&gt;thanks for the reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can you please provide the steps in detailed.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 03:10:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81385#M44706</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-10-23T03:10:50Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81386#M44707</link>
      <description>&lt;P&gt;I am getting below error after trying...&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; df.write.format("com.databricks.spark.avro").save("C:/Users/madan/Downloads/Avro/out/")&lt;BR /&gt;java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/FileFormat&lt;BR /&gt;at java.lang.ClassLoader.defineClass1(Native Method)&lt;BR /&gt;at java.lang.ClassLoader.defineClass(ClassLoader.java:763)&lt;BR /&gt;at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)&lt;BR /&gt;at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)&lt;BR /&gt;at java.net.URLClassLoader.access$100(URLClassLoader.java:73)&lt;BR /&gt;at java.net.URLClassLoader$1.run(URLClassLoader.java:368)&lt;BR /&gt;at java.net.URLClassLoader$1.run(URLClassLoader.java:362)&lt;BR /&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;BR /&gt;at java.net.URLClassLoader.findClass(URLClassLoader.java:361)&lt;BR /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:424)&lt;BR /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:411)&lt;BR /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:357)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)&lt;BR /&gt;at scala.util.Try.orElse(Try.scala:82)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:219)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:31)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:36)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:38)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:40)&lt;BR /&gt;at $iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:42)&lt;BR /&gt;at $iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:44)&lt;BR /&gt;at $iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:46)&lt;BR /&gt;at $iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:48)&lt;BR /&gt;at &amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:50)&lt;BR /&gt;at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:54)&lt;BR /&gt;at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)&lt;BR /&gt;at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:7)&lt;BR /&gt;at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)&lt;BR /&gt;at $print(&amp;lt;console&amp;gt;)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)&lt;BR /&gt;at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)&lt;BR /&gt;at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)&lt;BR /&gt;at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)&lt;BR /&gt;at org.apache.spark.repl.Main$.main(Main.scala:31)&lt;BR /&gt;at org.apache.spark.repl.Main.main(Main.scala)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:498)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)&lt;BR /&gt;at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)&lt;BR /&gt;Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.FileFormat&lt;BR /&gt;at java.net.URLClassLoader.findClass(URLClassLoader.java:381)&lt;BR /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:424)&lt;BR /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:357)&lt;BR /&gt;... 68 more&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;scala&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 03:24:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81386#M44707</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-10-23T03:24:08Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81491#M44708</link>
      <description>&lt;P&gt;Hi Harsha,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can you please tell me the /tmp location (are you refering the tmp folder under root or different one)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;because I have given in same way but I am getting below error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 02:42:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81491#M44708</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-10-25T02:42:19Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81492#M44709</link>
      <description>The /tmp/file.txt is a HDFS path, not a local path. It could be any HDFS&lt;BR /&gt;path if your Spark is configured to use HDFS as a default FS - I used /tmp&lt;BR /&gt;just for illustration.&lt;BR /&gt;&lt;BR /&gt;The same should work on local FS modes too, but I've not tried it.&lt;BR /&gt;</description>
      <pubDate>Thu, 25 Oct 2018 03:10:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81492#M44709</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2018-10-25T03:10:52Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81493#M44710</link>
      <description>&lt;P&gt;yes spark is configured to use HDFS as a default FS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;could you please let me know where exaactly I should keep the input file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried many ways but did not suceed. I am new to Hadoop&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 03:26:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81493#M44710</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-10-25T03:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81903#M44711</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/30060"&gt;@Harsh&lt;/a&gt;&amp;nbsp;thanks for providing solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am able to generate avro files, but for 10kb flat file I am getting two avro files(part1 and part2).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but all my flat files are of more than 50mb, in this case I will get many number of .avro files which difficult to maintain. so is there a away to generate one avro file even if the flat file is large.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Nov 2018 03:54:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81903#M44711</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-11-05T03:54:55Z</dc:date>
    </item>
    <item>
      <title>Re: Conversion of a file(with pipe(|), comma(,) and inverted commas(")) to avro format</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81973#M44712</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/213"&gt;@Harsh J&lt;/a&gt;&amp;nbsp;is there a way to avoid giving the column names manually...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;beacuse I&amp;nbsp;have 150 columns per table and more than 200 tables which is a huge number.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Nov 2018 01:41:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Conversion-of-a-file-with-pipe-comma-and-inverted-commas/m-p/81973#M44712</guid>
      <dc:creator>madankumarpuril</dc:creator>
      <dc:date>2018-11-06T01:41:09Z</dc:date>
    </item>
  </channel>
</rss>

