<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to efficiently map data columns from external files (around 5000) to a hbase column set? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120212#M22424</link>
    <description>&lt;P&gt;Hi Roy, please have a look at Apache Phoenix and its &lt;A href="http://phoenix.apache.org/views.html"&gt;views&lt;/A&gt; feature. This will let you define a base set of columns (producer_id, timestamp, event_type, etc) but also within the same table create additional logical views per record type.&lt;/P&gt;&lt;P&gt;Your use case sounds similar to the product_metrics table and specific mobile_product_metrics example given in the link above. Once your views are defined, you can query them to get metadata to apply to the records in your ingest queue.&lt;/P&gt;&lt;P&gt;Phoenix Views support issuing upsert statements to write new data.&lt;/P&gt;&lt;P&gt;Re: changing schema- Phoenix Views can be altered at will as your schemas change.&lt;/P&gt;</description>
    <pubDate>Thu, 10 Mar 2016 04:54:08 GMT</pubDate>
    <dc:creator>rgelhausen</dc:creator>
    <dc:date>2016-03-10T04:54:08Z</dc:date>
    <item>
      <title>How to efficiently map data columns from external files (around 5000) to a hbase column set?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120211#M22423</link>
      <description>&lt;P&gt;I have data coming in small files from a set of 5000 data providers. Since the data is coming from external providers the files have no id field. So I need to read each line and search HBase to find the id. At the moments I am avoiding/ignoring the complexity of creating a new id if non found by search. Since the data coming have different no of columns and formats I have to create a common format for storing all the data in a HBase table. &lt;/P&gt;&lt;P&gt;Now my question is there a tool or trick that can help me efficiently do the mapping of fields from these 5000 data formats to a common format. Also how will I manage when the data format is modified by the data provider. &lt;/P&gt;&lt;P&gt;If anybody has implemented such a system or has recommendations I would be glad to hear.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2016 04:03:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120211#M22423</guid>
      <dc:creator>SRoy</dc:creator>
      <dc:date>2016-03-10T04:03:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently map data columns from external files (around 5000) to a hbase column set?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120212#M22424</link>
      <description>&lt;P&gt;Hi Roy, please have a look at Apache Phoenix and its &lt;A href="http://phoenix.apache.org/views.html"&gt;views&lt;/A&gt; feature. This will let you define a base set of columns (producer_id, timestamp, event_type, etc) but also within the same table create additional logical views per record type.&lt;/P&gt;&lt;P&gt;Your use case sounds similar to the product_metrics table and specific mobile_product_metrics example given in the link above. Once your views are defined, you can query them to get metadata to apply to the records in your ingest queue.&lt;/P&gt;&lt;P&gt;Phoenix Views support issuing upsert statements to write new data.&lt;/P&gt;&lt;P&gt;Re: changing schema- Phoenix Views can be altered at will as your schemas change.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2016 04:54:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120212#M22424</guid>
      <dc:creator>rgelhausen</dc:creator>
      <dc:date>2016-03-10T04:54:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently map data columns from external files (around 5000) to a hbase column set?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120213#M22425</link>
      <description>&lt;P&gt;In HBase, you do not have to pre-declare the set of columns as you would in a RDBMS. You can have each row have a different set of columns which is one of the powerful features of HBase. &lt;/P&gt;&lt;P&gt;Phoenix exposes this, through a feature called "dynamic columns". You can declare a set of columns in the Phoenix table schema, but at query time or insertion time, you can do querying by specifying the columns on-the-fly. &lt;/P&gt;&lt;P&gt;Check out &lt;A href="https://phoenix.apache.org/dynamic_columns.html" target="_blank"&gt;https://phoenix.apache.org/dynamic_columns.html&lt;/A&gt; for syntax. &lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2016 05:29:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-efficiently-map-data-columns-from-external-files/m-p/120213#M22425</guid>
      <dc:creator>Enis</dc:creator>
      <dc:date>2016-03-10T05:29:05Z</dc:date>
    </item>
  </channel>
</rss>

