<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hbase Storing pdf and Retrieval in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134227#M96889</link>
    <description>&lt;P&gt;Please take a look at:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HBASE-11339" target="_blank"&gt;https://issues.apache.org/jira/browse/HBASE-11339&lt;/A&gt;&lt;/P&gt;&lt;P&gt;which would reduce I/O amplification incurred by medium objects.&lt;/P&gt;&lt;P&gt;This feature is in the upcoming HDP 2.5 release.&lt;/P&gt;</description>
    <pubDate>Tue, 19 Jul 2016 04:01:45 GMT</pubDate>
    <dc:creator>tyu</dc:creator>
    <dc:date>2016-07-19T04:01:45Z</dc:date>
    <item>
      <title>Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134225#M96887</link>
      <description>&lt;P&gt;We are planning to store PDF and Word Documents in Hbase. Storing part is fine. Retrieval is part i have questions on.&lt;/P&gt;&lt;P&gt;1. If we need to query this - Is there a way to do it using any Reporting tools ? Hbase --&amp;gt; Hive External Table --&amp;gt;JDBC/ODBC --&amp;gt; Excel or any BI Tool.  However how will the consumer app know that the field is a PDF FIle and not just a text field.&lt;/P&gt;&lt;P&gt;2. Is there a way for HBASE REST to handle this ?&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 02:36:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134225#M96887</guid>
      <dc:creator>ashok_padmanabh</dc:creator>
      <dc:date>2016-07-19T02:36:21Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134226#M96888</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/1457/ashokpadmanabhan.html" nodeid="1457"&gt;@Ash Pad&lt;/A&gt;, Phoenix has JDBC and REST APIs today. There is an ODBC driver under development which I believe is currently in beta. Thus you can do reporting style queries against whatever document metadata you store in "normal" column types.&lt;/P&gt;&lt;P&gt;To access the PDF object itself, you can use the JDBC/ODBC/REST apis to read/write the column as raw bytes. See the &lt;A href="http://phoenix.apache.org/language/datatypes.html#binary_type"&gt;Phoenix DataTypes&lt;/A&gt; page to understand the various column types which support binary values.&lt;/P&gt;&lt;P&gt;Re: HBase REST- you could use this if desired, though I don't see why you would vs. using the built-in JDBC/ODBC capabilities.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 02:56:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134226#M96888</guid>
      <dc:creator>rgelhausen</dc:creator>
      <dc:date>2016-07-19T02:56:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134227#M96889</link>
      <description>&lt;P&gt;Please take a look at:&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HBASE-11339" target="_blank"&gt;https://issues.apache.org/jira/browse/HBASE-11339&lt;/A&gt;&lt;/P&gt;&lt;P&gt;which would reduce I/O amplification incurred by medium objects.&lt;/P&gt;&lt;P&gt;This feature is in the upcoming HDP 2.5 release.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 04:01:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134227#M96889</guid>
      <dc:creator>tyu</dc:creator>
      <dc:date>2016-07-19T04:01:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134228#M96890</link>
      <description>&lt;P style="margin-left: 80px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/1457/ashokpadmanabhan.html" nodeid="1457"&gt;@Ash Pad&lt;/A&gt; I personally like &lt;A rel="user" href="https://community.cloudera.com/users/532/tyu.html" nodeid="532"&gt;@Ted Yu&lt;/A&gt; answer.  Until that is release I don't recommend storing these files on hbase.  Instead a common practice to have the file save on HDFS and have the "pointer" stored in hbase. &lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 09:26:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134228#M96890</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-07-19T09:26:57Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134229#M96891</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/45736/hbase-storing-pdf-and-retrieval.html#"&gt;@Ash Pad&lt;/A&gt;, how big are your PDFs?&lt;/P&gt;&lt;P&gt;As in all things, it depends on your use case. If you PDFs are not in the multi-megabyte range, you may be fine storing them in a second column family today. This has the advantage of letting you query against doc metadata very quickly without needing to load full file contents into RegionServer memory. In most document management systems, this is highly desirable, as there is far more searching/querying than there is actual full content access.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 22:24:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134229#M96891</guid>
      <dc:creator>rgelhausen</dc:creator>
      <dc:date>2016-07-19T22:24:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134230#M96892</link>
      <description>&lt;P&gt;PDFs are 50KB Max. and each rowkey can have upto a max of 5 PDFs associated with it. And the total volume of records would be around 500K range. Like you suggest we have 2 column Families, one for the metadata and one for the documents. your suggestion actually gives a vote of confidence to our thought process.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 00:51:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134230#M96892</guid>
      <dc:creator>ashok_padmanabh</dc:creator>
      <dc:date>2016-07-20T00:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134231#M96893</link>
      <description>&lt;P&gt;Can you please tell me how to store PDF and Word Documents in Hbase? @&lt;A href="https://community.hortonworks.com/users/1457/ashokpadmanabhan.html"&gt;Ash Pad&lt;/A&gt;&lt;/P&gt;,&lt;P&gt;Can you please tell me how did you store PDF and Word Documents in Hbase? @&lt;A href="https://community.hortonworks.com/users/1457/ashokpadmanabhan.html"&gt;Ash Pad&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 03:05:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134231#M96893</guid>
      <dc:creator>sabamu_lakhey</dc:creator>
      <dc:date>2016-12-16T03:05:46Z</dc:date>
    </item>
    <item>
      <title>Re: Hbase Storing pdf and Retrieval</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134232#M96894</link>
      <description>&lt;P&gt;HDP 2.5 has been released.&lt;/P&gt;&lt;P&gt;You can use the MOB feature now.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 03:13:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hbase-Storing-pdf-and-Retrieval/m-p/134232#M96894</guid>
      <dc:creator>tyu</dc:creator>
      <dc:date>2016-12-16T03:13:57Z</dc:date>
    </item>
  </channel>
</rss>

