<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Need help creating a custom SerDe. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96918#M10455</link>
    <description>&lt;P&gt;I have a file with the following pattern: K,K,K,K,K,K,KV,KV,KV,KV.....&lt;/P&gt;&lt;P&gt;The initial standalone Keys (K) values are static and never change. The KV (key value pairs) after the keys are dynamic (additional KV pairs can be added or removed at anytime) and need to be listed as a map in Hive. The first K values would be listed as columns. &lt;/P&gt;&lt;P&gt;Does someone have any code for a custom SerDe I can include in the Hive table definition for a file with this structure?  Currently we are using a custom UDF with Python but would like to store the files directly in HDFS and only apply the schema at runtime.&lt;/P&gt;</description>
    <pubDate>Wed, 11 Nov 2015 22:59:24 GMT</pubDate>
    <dc:creator>SQLShaw</dc:creator>
    <dc:date>2015-11-11T22:59:24Z</dc:date>
    <item>
      <title>Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96918#M10455</link>
      <description>&lt;P&gt;I have a file with the following pattern: K,K,K,K,K,K,KV,KV,KV,KV.....&lt;/P&gt;&lt;P&gt;The initial standalone Keys (K) values are static and never change. The KV (key value pairs) after the keys are dynamic (additional KV pairs can be added or removed at anytime) and need to be listed as a map in Hive. The first K values would be listed as columns. &lt;/P&gt;&lt;P&gt;Does someone have any code for a custom SerDe I can include in the Hive table definition for a file with this structure?  Currently we are using a custom UDF with Python but would like to store the files directly in HDFS and only apply the schema at runtime.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 22:59:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96918#M10455</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2015-11-11T22:59:24Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96919#M10456</link>
      <description>&lt;P&gt;Sorry, the actual format is all comma separated. So after a fixed number of keys (let's assume eight) the pattern then switches to a dynamic number of key\value pairs: k,k,k,k,k,k,k,k,k,v,k,v,k,v,k,v.....&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 23:22:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96919#M10456</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2015-11-11T23:22:30Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96920#M10457</link>
      <description>&lt;P&gt;Instead of spending time writing a new SerDe, wouldn't it be possible to use the following approach:&lt;/P&gt;&lt;P&gt;1) Use a Regex SerDe (https://hive.apache.org/javadocs/r1.2.1/api/org/apache/hadoop/hive/serde2/RegexSerDe.html ) to get in a first temporary table the 8 "keys" columns and the last (String) dynamic column&lt;/P&gt;&lt;P&gt;2) With a CTAS, insert the data into an ORC table, using the str_to_map() UDF to transform the string dynamic column into a map. This step would also enable you to have your data in a more performant backend.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 23:46:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96920#M10457</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2015-11-11T23:46:45Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96921#M10458</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/186/sshaw.html" nodeid="186"&gt;@Scott Shaw&lt;/A&gt;, &lt;A rel="user" href="https://community.cloudera.com/users/342/sluangsay.html" nodeid="342"&gt;@Sourygna Luangsay&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I created a "minimum-viable-serde" implementing what you described. See if it is what you need. &lt;/P&gt;&lt;P&gt;PS: I'm assuming your last column will be a map&amp;lt;string,string&amp;gt;, I haven't done data type handling for last column yet. For the key columns, it will respect the data type you declare when creating table.&lt;/P&gt;&lt;P&gt;from shell:&lt;/P&gt;&lt;PRE&gt;wget &lt;A href="https://github.com/gbraccialli/HiveUtils/raw/master/target/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar" target="_blank"&gt;https://github.com/gbraccialli/HiveUtils/raw/master/target/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar&lt;/A&gt; -O /tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar

echo "a,b,c,adsfa,adfa" &amp;gt; /tmp/testserde.txt
echo "1,2,3,asdfasdf,sdfasd" &amp;gt;&amp;gt; /tmp/testserde.txt
echo "4,5,6,adfas,adf,d" &amp;gt;&amp;gt; /tmp/testserde.txt
hadoop fs -mkdir /tmp/testserde/
hadoop fs -put -f /tmp/testserde.txt /tmp/testserde/
hive&lt;/PRE&gt;&lt;P&gt;from hive:&lt;/P&gt;&lt;PRE&gt;add jar /tmp/HiveUtils-1.0-SNAPSHOT-jar-with-dependencies.jar;
drop table testserde;
create external table testserde (
 field1 string,
 field2 int,
 field3 double,
 maps map&amp;lt;string,string&amp;gt;
)
ROW FORMAT SERDE 'com.github.gbraccialli.hive.serde.NKeys_MapKeyValue'
WITH SERDEPROPERTIES (
 "delimiter" = ","
)
LOCATION '/tmp/testserde/';

select * from testserde;
&lt;/PRE&gt;&lt;P&gt;Source code is here:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/gbraccialli/HiveUtils"&gt;https://github.com/gbraccialli/HiveUtils&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://github.com/gbraccialli/HiveUtils/blob/master/src/main/java/com/github/gbraccialli/hive/serde/NKeys_MapKeyValue.java"&gt;https://github.com/gbraccialli/HiveUtils/blob/master/src/main/java/com/github/gbraccialli/hive/serde/NKeys_MapKeyValue.java&lt;/A&gt;&lt;/P&gt;&lt;P&gt;PS2: there are lots of TODO yet.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2015 11:06:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96921#M10458</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-11-13T11:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96922#M10459</link>
      <description>&lt;P&gt;Thanks!! This looks great. We'll give it a try.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2015 21:47:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96922#M10459</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2015-11-13T21:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96923#M10460</link>
      <description>&lt;P&gt;It also works with spark-sql.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2015 06:48:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96923#M10460</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-02T06:48:48Z</dc:date>
    </item>
    <item>
      <title>Re: Need help creating a custom SerDe.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96924#M10461</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/238/gbraccialli.html" nodeid="238"&gt;@Guilherme Braccialli&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I have a log file in which i have last field as key value pair. &lt;/P&gt;&lt;P&gt;for example:&lt;/P&gt;&lt;P&gt;2017-11-29 16:19:39,217 DEBUG [pool-4-thread-4] OutBound Msg From Engine -&amp;gt; |9=76|35=p|a=b|c=hg|&lt;/P&gt;&lt;P&gt;2017-11-29 16:20:29,217 DEBUG [pool-4-thread-4] OutBound Msg From Engine -&amp;gt; |3=6|35=w|a=b|&lt;/P&gt;&lt;P&gt;how to analyse this? Can we use your custom serde for this? &lt;/P&gt;&lt;P&gt;Because Regex serde is not supporting complex data types.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Mar 2018 15:54:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Need-help-creating-a-custom-SerDe/m-p/96924#M10461</guid>
      <dc:creator>rajkiranu</dc:creator>
      <dc:date>2018-03-13T15:54:31Z</dc:date>
    </item>
  </channel>
</rss>

