<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Best practice to import data into HBase/Phoenix? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221381#M82232</link>
    <description>&lt;P&gt;In this case, The best option i see is to use sqoop and load the data into HDFS from RDBMS (this will be a parallel copy and should be fast). Then use &lt;A href="https://phoenix.apache.org/bulk_dataload.html"&gt;Phoenix Bulk loading MR job&lt;/A&gt; to load that HDFS data into Phoenix. &lt;/P&gt;</description>
    <pubDate>Thu, 16 Aug 2018 22:17:51 GMT</pubDate>
    <dc:creator>sandyy006</dc:creator>
    <dc:date>2018-08-16T22:17:51Z</dc:date>
    <item>
      <title>Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221377#M82228</link>
      <description>&lt;P&gt;Hello- I have HDP 3.0 with Sqoop v1.4.7. &lt;/P&gt;&lt;P&gt;What is the best way to migrate my data from an external RDBMS into something query-able from Phoenix? I want to make sure I import it in a way that it was have very fast queries.&lt;/P&gt;&lt;P&gt;Do I need to Sqoop it into HDFS first or can I go directly into HBase? It looks like the Sqoop - Phoenix is not yet completed so I believe I will need to sqoop the data into HDFS or Hbase and then connect to Phoenix. Can someone show (or point) me to how to do that?&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.linkedin.com/pulse/sqooprdbms-hbasephoenix-amit-dass/"&gt;This post&lt;/A&gt; makes me think that I will need to go RBDMS&amp;gt;HDFS&amp;gt;CSV&amp;gt;Phoenix, please tell me that is not true...&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 06:38:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221377#M82228</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-16T06:38:53Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221378#M82229</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt; What is the data size on RDBMS? Also is this a one time migration or will it be a scheduled one? &lt;/P&gt;&lt;P&gt;- if it is one time migration and the size is reasonable. you can directly export the rdbms data in csv format and then directly use it to import in Phoenix. (this approach will be simple)&lt;/P&gt;&lt;P&gt;-  RBDMS&amp;gt;HDFS using sqoop and then create a hive table with phoenix storage handler (https://phoenix.apache.org/hive_storage_handler.html).&lt;/P&gt;&lt;P&gt;- Or RBDMS&amp;gt;HDFS&amp;gt;CSV&amp;gt;Phoenix (what you mentioned in description) &lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 14:00:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221378#M82229</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-16T14:00:56Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221379#M82230</link>
      <description>&lt;P&gt;Just realised we have one more approach: &lt;/P&gt;&lt;P&gt;- RBDMS&amp;gt;HBASE using sqoop and then create a table in table in phoenix (map the hbase tables in phoenix) &lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 14:09:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221379#M82230</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-16T14:09:21Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221380#M82231</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10159/snemuri.html" nodeid="10159"&gt;@Sandeep Nemuri&lt;/A&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks so much for the advice! My table occupies 1.5 terabytes in MS SQL server. This will be a one-time migration. I feel that exporting to a csv would take days of processing time. Your last approach sounds the best but I've heard that it is not possible based on &lt;A href="https://community.hortonworks.com/questions/88790/data-transfer-method-from-rdbms-to-phoenix.html"&gt;this post&lt;/A&gt; and &lt;A href="https://community.hortonworks.com/questions/15381/created-phoenix-view-to-map-existing-hbase-table-b.html"&gt;this post&lt;/A&gt; (my table does have a float column). &lt;/P&gt;&lt;P&gt;That being said, what is my best option? I'm thinking that I may try to split up my table into n csv files and load them sequentially into Phoenix. Would that be the best option for data at this size?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 21:55:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221380#M82231</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-16T21:55:58Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221381#M82232</link>
      <description>&lt;P&gt;In this case, The best option i see is to use sqoop and load the data into HDFS from RDBMS (this will be a parallel copy and should be fast). Then use &lt;A href="https://phoenix.apache.org/bulk_dataload.html"&gt;Phoenix Bulk loading MR job&lt;/A&gt; to load that HDFS data into Phoenix. &lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 22:17:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221381#M82232</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-16T22:17:51Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221382#M82233</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt; Don't forget to Accept my answer if it helped you. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 23:28:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221382#M82233</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-16T23:28:33Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221383#M82234</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10159/snemuri.html" nodeid="10159"&gt;@Sandeep Nemuri&lt;/A&gt; I'm not sure if I follow what you are talking about. The page you pointed to shows a bulk load from CSV&amp;gt;Phoenix or HDFS JSON&amp;gt;Phoenix. Can you provide a link or command on how one would go from Sqoop's HDFS output to Phoenix directly?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 23:42:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221383#M82234</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-16T23:42:49Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221384#M82235</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt; Since Phoenix is on top of Hbase. Data will be in Hbase in anycase. It is just that Phoenix uses its encoding while writing the data into tables. More details in this &lt;A href="https://community.hortonworks.com/questions/15381/created-phoenix-view-to-map-existing-hbase-table-b.html"&gt;thread&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 23:52:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221384#M82235</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-16T23:52:58Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221385#M82236</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10159/snemuri.html" nodeid="10159"&gt;@Sandeep Nemuri&lt;/A&gt; I edited my above question, do you mind taking a look at it? I'm seeing a CsvBulkLoadTool and a JsonBulkLoadTool. How will I bulk load my sqoop-loaded data?&lt;/P&gt;</description>
      <pubDate>Fri, 17 Aug 2018 00:03:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221385#M82236</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-17T00:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: Best practice to import data into HBase/Phoenix?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221386#M82237</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt; The doc is bit misleading, Below command should help you in loading the hdfs csv files to phoenix table. (Note that the input path is of HDFS)&lt;/P&gt;&lt;PRE&gt;HADOOP_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /tmp/data.csv&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Aug 2018 01:17:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-practice-to-import-data-into-HBase-Phoenix/m-p/221386#M82237</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-08-17T01:17:14Z</dc:date>
    </item>
  </channel>
</rss>

