<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HBase: Composite key for ImportTsv in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40328#M25556</link>
    <description>ImportTSV is not capable of this now. One would have to preprocess the columns first to create the composite key in a new CSV file and then use ImportTSV to import the new CSV file which contains the composite key column and it's data.</description>
    <pubDate>Fri, 29 Apr 2016 23:30:05 GMT</pubDate>
    <dc:creator>EvanH</dc:creator>
    <dc:date>2016-04-29T23:30:05Z</dc:date>
    <item>
      <title>HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/39908#M25553</link>
      <description>&lt;P&gt;Hi dear experts!&lt;/P&gt;&lt;P&gt;i'm trying to load data from CSV format on HDFS to HBase with ImportTSV (&lt;A href="http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv" target="_self"&gt;importtsv&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;it works perfectly fine in case when&amp;nbsp;HBASE_ROW_KEY is the single CSV column.&lt;/P&gt;&lt;P&gt;but i don't know how to create composite&amp;nbsp;HBASE_ROW_KEY (from two columns).&lt;/P&gt;&lt;P&gt;for example, i have CSV with 3 columns:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;row1, 1, abc
row1, 2, dd
row2, 1, iop
row3, 1, kk&lt;/PRE&gt;&lt;P&gt;and row could be uniqly identified by first two columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;any inputs will be highly appreciated!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:14:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/39908#M25553</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2022-09-16T10:14:39Z</dc:date>
    </item>
    <item>
      <title>Re: HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40112#M25554</link>
      <description>The ImportTSV is a simple utility and does not currently support this.&lt;BR /&gt;&lt;BR /&gt;Perhaps you can take a look at Kite SDK's HBase and CSV dataset handling capabilities, which are capable of these tasks (although it uses the more efficient Avro encoding instead of plaintext during serialisation). Read more at &lt;A href="http://kitesdk.org/docs/1.1.0/" target="_blank"&gt;http://kitesdk.org/docs/1.1.0/&lt;/A&gt;</description>
      <pubDate>Sun, 24 Apr 2016 19:06:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40112#M25554</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-04-24T19:06:37Z</dc:date>
    </item>
    <item>
      <title>Re: HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40113#M25555</link>
      <description>Of course, another easier way to use ImportTSV itself, is to re-transform your CSV input via a custom mapper (passed via configuration key "importtsv.mapper.class"), and "merge" the two rows together before the CSV parser maps them into the designated fields.&lt;BR /&gt;&lt;BR /&gt;This is the default Map class for ImportTSV, for reference: &lt;A href="https://github.com/cloudera/hbase/blob/cdh5.7.0-release/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TsvImporterMapper.java" target="_blank"&gt;https://github.com/cloudera/hbase/blob/cdh5.7.0-release/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TsvImporterMapper.java&lt;/A&gt;</description>
      <pubDate>Sun, 24 Apr 2016 19:08:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40113#M25555</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-04-24T19:08:50Z</dc:date>
    </item>
    <item>
      <title>Re: HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40328#M25556</link>
      <description>ImportTSV is not capable of this now. One would have to preprocess the columns first to create the composite key in a new CSV file and then use ImportTSV to import the new CSV file which contains the composite key column and it's data.</description>
      <pubDate>Fri, 29 Apr 2016 23:30:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/40328#M25556</guid>
      <dc:creator>EvanH</dc:creator>
      <dc:date>2016-04-29T23:30:05Z</dc:date>
    </item>
    <item>
      <title>Re: HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/61444#M25557</link>
      <description>&lt;P&gt;Hi! Sorry for digging out this thread, but I am currently facing the same problem and decided to run MR job to transform my data before importing it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I am unsure what the data output should look like for it to be understood by HBase. As far as I know, HBase saves everything as bytes anyway, but makes a difference for timestamps. So,say I want to queue Factory_ID:YYYMMDD:Order_ID:UID for my composite key. Should I output them with ":" as a separator. Or just one after another? Will HBase be able to use this information to shard the table into different regions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 10:24:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/61444#M25557</guid>
      <dc:creator>ebau</dc:creator>
      <dc:date>2017-11-02T10:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: HBase: Composite key for ImportTsv</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/61450#M25558</link>
      <description>You are right that its all just byte sequences to HBase, and that it sorts&lt;BR /&gt;everything lexicographically. You do not require a separator character when&lt;BR /&gt;composing your key for HBase to understand them as boundaries (cause it&lt;BR /&gt;would not serve as one), unless you prefer the extra bytes for better&lt;BR /&gt;readability or for recovering back the individual data elements from&lt;BR /&gt;(variable length) keys if that's a use-case.&lt;BR /&gt;&lt;BR /&gt;HBase 'sharding' (splitting) can be manually specified at table create time&lt;BR /&gt;if you are aware of your key pattern and ranges - this is strongly&lt;BR /&gt;recommended to scale from the beginning. Otherwise, HBase computes key&lt;BR /&gt;midpoints by analysing them in byte form and splits them based on that,&lt;BR /&gt;whenever a split size threshold is reached for a region range.&lt;BR /&gt;</description>
      <pubDate>Thu, 02 Nov 2017 11:39:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-Composite-key-for-ImportTsv/m-p/61450#M25558</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2017-11-02T11:39:37Z</dc:date>
    </item>
  </channel>
</rss>

