<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: KITE SDK 'Provided partitioners do not reference a source field and instead require that a value in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42294#M32838</link>
    <description>&lt;P&gt;Hi Khalef,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you had a look at this blog post covering Kite?&amp;nbsp;&lt;A href="https://blog.cloudera.com/blog/2014/06/how-to-use-kite-sdk-to-easily-store-and-configure-data-in-apache-hadoop/" target="_blank"&gt;https://blog.cloudera.com/blog/2014/06/how-to-use-kite-sdk-to-easily-store-and-configure-data-in-apache-hadoop/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers, Lars&lt;/P&gt;</description>
    <pubDate>Fri, 24 Jun 2016 18:59:10 GMT</pubDate>
    <dc:creator>Lars Volker</dc:creator>
    <dc:date>2016-06-24T18:59:10Z</dc:date>
    <item>
      <title>KITE SDK 'Provided partitioners do not reference a source field and instead require that a value is'</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42260#M32837</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using kite sdk on quick start vm to do some datasets creation, but I can not see how to pass a provided partion value when I do csv-import or json-import.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can we achieve that?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jun 2016 00:39:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42260#M32837</guid>
      <dc:creator>Khalef</dc:creator>
      <dc:date>2016-06-24T00:39:45Z</dc:date>
    </item>
    <item>
      <title>Re: KITE SDK 'Provided partitioners do not reference a source field and instead require that a value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42294#M32838</link>
      <description>&lt;P&gt;Hi Khalef,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you had a look at this blog post covering Kite?&amp;nbsp;&lt;A href="https://blog.cloudera.com/blog/2014/06/how-to-use-kite-sdk-to-easily-store-and-configure-data-in-apache-hadoop/" target="_blank"&gt;https://blog.cloudera.com/blog/2014/06/how-to-use-kite-sdk-to-easily-store-and-configure-data-in-apache-hadoop/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers, Lars&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jun 2016 18:59:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42294#M32838</guid>
      <dc:creator>Lars Volker</dc:creator>
      <dc:date>2016-06-24T18:59:10Z</dc:date>
    </item>
    <item>
      <title>Re: KITE SDK 'Provided partitioners do not reference a source field and instead require that a value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42306#M32839</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Lars,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Yes I have read I believe most of the articles and the doco writen on Kite SDK.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;However, my partition fields (year, month, day) are not part of my data files, and there is no date or timestamp field that tells me that this data is of today or a month ago.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My partition config (if I can use one) would be:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class="crayon-line"&gt;&lt;SPAN class="crayon-sy"&gt;[{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"type"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"provided"&lt;/SPAN&gt;&lt;SPAN class="crayon-sy"&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"name"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"year"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;&lt;SPAN class="crayon-sy"&gt;},&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;{&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"type"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"provided"&lt;/SPAN&gt;&lt;SPAN class="crayon-sy"&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"name"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"month"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;&lt;SPAN class="crayon-sy"&gt;},&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;&lt;DIV class="crayon-line"&gt;&lt;SPAN class="crayon-sy"&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"type"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"provided"&lt;/SPAN&gt;&lt;SPAN class="crayon-sy"&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line crayon-striped-line"&gt;&lt;SPAN class="crayon-h"&gt;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="crayon-s"&gt;"name"&lt;/SPAN&gt; &lt;SPAN class="crayon-o"&gt;:&lt;/SPAN&gt; &lt;SPAN class="crayon-s"&gt;"day"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="crayon-line"&gt;&lt;SPAN class="crayon-sy"&gt;}&lt;/SPAN&gt;]&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;And when I want to csv-import or json-import my files I don't see how to tell kitesdk-dataset explicitly that I want to store the imported file in partition year=2016, month=05, day=30.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Right now this is what I am doing: I create a dataset, create a partition directory and then copy the parquet file to it):&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; kite-dataset csv-schema ods_ml_au.Introducer_Group_30_05_2016.psv --class IntroducerGroup --delimiter '|' -o introducerGroup.avsc&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; hdfs dfs -put introducerGroup.avsc /user/caf/macleasing/format&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; kite-dataset create dataset:hdfs:/user/caf/macleasing/stage/ml/introducerGroups -s hdfs:/user/caf/macleasing/format/introducerGroup.avsc -f parquet&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; hdfs dfs -put ods_ml_au.Introducer_Group_30_05_2016.psv /user/caf/macleasing/source&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; kite-dataset csv-import hdfs:/user/caf/macleasing/source/ods_ml_au.Introducer_Group_30_05_2016.psv dataset:hdfs:/user/caf/macleasing/stage/ml/introducerGroups --delimiter '|'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; hdfs dfs -mkdir -p /user/caf/macleasing/stage/ml/introducerGroups/year=2016/month=05/day=30&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt;hdfs dfs -mv /user/caf/macleasing/stage/ml/introducerGroups/*.parquet /user/caf/macleasing/stage/ml/introducerGroups/year=2016/month=05/day=30/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can I avoid the explicit creation of directory and file movement??&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to use my partition-config&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers Khalef&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jun 2016 04:30:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42306#M32839</guid>
      <dc:creator>Khalef</dc:creator>
      <dc:date>2016-06-25T04:30:26Z</dc:date>
    </item>
    <item>
      <title>Re: KITE SDK 'Provided partitioners do not reference a source field and instead require that a value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42409#M32840</link>
      <description>&lt;P&gt;Hi Khalef,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't know the answer to your question myself, but I asked around&amp;nbsp;for an expert on Kite and learned that the best source for help would be the Kite project itself. You probably found their website already:&amp;nbsp;&lt;A href="http://kitesdk.org/docs/current/" target="_blank"&gt;http://kitesdk.org/docs/current/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It also links to their mailing list and I would like to ask you to&amp;nbsp;post your question there:&amp;nbsp;&lt;A href="https://groups.google.com/a/cloudera.org/forum/#!forum/cdk-dev" target="_blank"&gt;https://groups.google.com/a/cloudera.org/forum/#!forum/cdk-dev&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Apologies for the inconvenience. Best wishes, Lars&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jun 2016 21:50:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42409#M32840</guid>
      <dc:creator>Lars Volker</dc:creator>
      <dc:date>2016-06-28T21:50:45Z</dc:date>
    </item>
    <item>
      <title>Re: KITE SDK 'Provided partitioners do not reference a source field and instead require that a value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42412#M32841</link>
      <description>Thanks Lars.</description>
      <pubDate>Wed, 29 Jun 2016 05:33:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/KITE-SDK-Provided-partitioners-do-not-reference-a-source/m-p/42412#M32841</guid>
      <dc:creator>Khalef</dc:creator>
      <dc:date>2016-06-29T05:33:02Z</dc:date>
    </item>
  </channel>
</rss>

