<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Need to bring prod hive table data into test environment using distcp. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190982#M153071</link>
    <description>&lt;P&gt;I need to bring prod hive table data into test hive table. Since it's a hadoop to hadoop, i can't use sqoop, hence i can use discp to transfer data across the clusters. But i have one more scenario to be handled while bringing data, that is filtering. Say i have 10 million records in prod hive table, i want to filter using some criteria and bring it to test table. is there a way to give filter parameters in distcp command on the fly? Or any other suggestions? Thanks in advance.&lt;/P&gt;</description>
    <pubDate>Fri, 07 Apr 2017 00:31:04 GMT</pubDate>
    <dc:creator>muthaiyaprabhu</dc:creator>
    <dc:date>2017-04-07T00:31:04Z</dc:date>
    <item>
      <title>Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190982#M153071</link>
      <description>&lt;P&gt;I need to bring prod hive table data into test hive table. Since it's a hadoop to hadoop, i can't use sqoop, hence i can use discp to transfer data across the clusters. But i have one more scenario to be handled while bringing data, that is filtering. Say i have 10 million records in prod hive table, i want to filter using some criteria and bring it to test table. is there a way to give filter parameters in distcp command on the fly? Or any other suggestions? Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 00:31:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190982#M153071</guid>
      <dc:creator>muthaiyaprabhu</dc:creator>
      <dc:date>2017-04-07T00:31:04Z</dc:date>
    </item>
    <item>
      <title>Re: Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190983#M153072</link>
      <description>&lt;P&gt;You can use distcp -filters to ignore few path, patterns&lt;/P&gt;&lt;P&gt;Refer this:&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.ericlin.me/how-to-use-filters-to-exclude-files-when-in-distcp" target="_blank"&gt;http://www.ericlin.me/how-to-use-filters-to-exclude-files-when-in-distcp&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;hadoop distcp -filters /path/to/filterfile.txt hdfs://source/path hdfs://destination/path
&lt;/PRE&gt;</description>
      <pubDate>Fri, 07 Apr 2017 00:41:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190983#M153072</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-04-07T00:41:58Z</dc:date>
    </item>
    <item>
      <title>Re: Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190984#M153073</link>
      <description>&lt;P&gt;Thanks &lt;A href="https://community.hortonworks.com/users/102/nmaheshwari.html"&gt;Namit Maheshwari&lt;/A&gt;, data i am bringing into test is hive data, i need to filter using some criteria, like where condition in hive query. distcp -filters to exclude some files right, not on the data level. I want to filter the hive data using some criteria in production, and then want to bring the filtered data into test region. &lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 01:51:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190984#M153073</guid>
      <dc:creator>muthaiyaprabhu</dc:creator>
      <dc:date>2017-04-07T01:51:52Z</dc:date>
    </item>
    <item>
      <title>Re: Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190985#M153074</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/3847/muthaiyaprabhu.html" nodeid="3847"&gt;@Prabhu  Muthaiyan&lt;/A&gt;&lt;P&gt;Filter the data from hive prod and load it into a file and then as mentioned by &lt;A rel="user" href="https://community.cloudera.com/users/102/nmaheshwari.html" nodeid="102"&gt;@Namit Maheshwari&lt;/A&gt; use distcp to transfer between different environments. If you want to limit the data without  any filters being applied filter only a set of files under a HDFS folder.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 02:12:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190985#M153074</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2017-04-07T02:12:31Z</dc:date>
    </item>
    <item>
      <title>Re: Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190986#M153075</link>
      <description>&lt;P&gt;Thank you Bala!&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 19:50:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190986#M153075</guid>
      <dc:creator>muthaiyaprabhu</dc:creator>
      <dc:date>2017-04-07T19:50:44Z</dc:date>
    </item>
    <item>
      <title>Re: Need to bring prod hive table data into test environment using distcp.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190987#M153076</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3847/muthaiyaprabhu.html" nodeid="3847"&gt;@Prabhu  Muthaiyan&lt;/A&gt; Glad that it helped you. Happy Hadooping!!&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 19:58:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Need-to-bring-prod-hive-table-data-into-test-environment/m-p/190987#M153076</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2017-04-07T19:58:59Z</dc:date>
    </item>
  </channel>
</rss>

