<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hadoop tools/technology and design recommendation in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93517#M57220</link>
    <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did you try performing this with the help of Hive queries which I think would be possible?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. CREATE a new empty table with the columns with correct datatypes as per the requirement (meaning the final file's column structure)&lt;/P&gt;&lt;P&gt;2. INSERT data into this new table with a SELECT query with JOIN to join the data from both the views.&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. You will have the files present in the table's HDFS directory. This would be your final file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Tue, 30 Jul 2019 10:46:09 GMT</pubDate>
    <dc:creator>Gomathinayagam</dc:creator>
    <dc:date>2019-07-30T10:46:09Z</dc:date>
    <item>
      <title>Hadoop tools/technology and design recommendation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93276#M57165</link>
      <description>&lt;P&gt;I have to read data from two different hive views(probably two different databases) , extract data from those views and write it into files and then, join those files and perform data formatting and finally , write it into final file.&lt;/P&gt;
&lt;P&gt;Could you please suggest me some tools/technologies and design for hadoop for this requirement.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:32:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93276#M57165</guid>
      <dc:creator>iamsaaj</dc:creator>
      <dc:date>2022-09-16T14:32:08Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop tools/technology and design recommendation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93517#M57220</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did you try performing this with the help of Hive queries which I think would be possible?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. CREATE a new empty table with the columns with correct datatypes as per the requirement (meaning the final file's column structure)&lt;/P&gt;&lt;P&gt;2. INSERT data into this new table with a SELECT query with JOIN to join the data from both the views.&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. You will have the files present in the table's HDFS directory. This would be your final file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 10:46:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93517#M57220</guid>
      <dc:creator>Gomathinayagam</dc:creator>
      <dc:date>2019-07-30T10:46:09Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop tools/technology and design recommendation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93520#M57223</link>
      <description>&lt;P&gt;Instead of joining files, you can perform operations/transformations with hive query and then final results can be written to file(parquet kind to HDFS or to local file system)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can even use spark for transformations and joining the views. (spark will give better performance than hive sql)&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 14:51:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93520#M57223</guid>
      <dc:creator>vijayshc</dc:creator>
      <dc:date>2019-07-30T14:51:21Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop tools/technology and design recommendation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93738#M57349</link>
      <description>&lt;P&gt;Thanks for the reply !&lt;/P&gt;&lt;P&gt;Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here are the two approach i came up with&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Extract data from Hive view into files.&lt;/P&gt;&lt;P&gt;2. Create intermediate Hive tables and load data extracted from views.&lt;/P&gt;&lt;P&gt;3. Join the new hive tables to generate the final file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2019 13:04:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93738#M57349</guid>
      <dc:creator>iamsaaj</dc:creator>
      <dc:date>2019-08-06T13:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop tools/technology and design recommendation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93739#M57350</link>
      <description>&lt;P&gt;Thanks for the reply !&lt;/P&gt;&lt;P&gt;Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue.&lt;/P&gt;&lt;P&gt;Here are the two approach i came up with&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Extract data from Hive view into files.&lt;/P&gt;&lt;P&gt;2. Create intermediate Hive tables and load data extracted from views.&lt;/P&gt;&lt;P&gt;3. Join the new hive tables to generate the final file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2019 13:05:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hadoop-tools-technology-and-design-recommendation/m-p/93739#M57350</guid>
      <dc:creator>iamsaaj</dc:creator>
      <dc:date>2019-08-06T13:05:40Z</dc:date>
    </item>
  </channel>
</rss>

