<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !! in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187384#M75926</link>
    <description>&lt;P style="margin-left: 40px;"&gt;&lt;A rel="user" href="https://community.cloudera.com/users/70417/chaimajandoubi576.html" nodeid="70417"&gt;@Jandoubi Chaima&lt;/A&gt; 1. when you say, EvaluateJSONPath doesn't works, what is the issue that you are facing? 2. Do you need your final data on HDFS in JSON format?  3. What is the query that you want in ReplaceText? DDL? DML?&lt;/P&gt;</description>
    <pubDate>Sat, 17 Mar 2018 14:55:27 GMT</pubDate>
    <dc:creator>RahulSoni</dc:creator>
    <dc:date>2018-03-17T14:55:27Z</dc:date>
    <item>
      <title>What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187383#M75925</link>
      <description>&lt;P&gt;I'll try to explain the problem !&lt;STRONG&gt;
GetHDFS--&amp;gt;ValidateXML---&amp;gt;TransformXMLToJson---&amp;gt;JoltTransformation---&amp;gt;SplitJson--&amp;gt; EvaluteJsonPath----&amp;gt;ReplaceText---&amp;gt;PutHiveQL&lt;/STRONG&gt;.
Till the 5 step, it works perfectly for me , but when trying to EvaluteJsonPath , it doesn't work 
So how can I evaluate the hol content of the flowfile cos I need to push it completely in the hive , and please how can I use ReplaceText to insert query into hive.
thnks for giving help &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="64625-dataprovenance.png" style="width: 938px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18728i352F3B72BB0A03E9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="64625-dataprovenance.png" alt="64625-dataprovenance.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:58:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187383#M75925</guid>
      <dc:creator>chaimajandoubi5</dc:creator>
      <dc:date>2022-09-16T12:58:41Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187384#M75926</link>
      <description>&lt;P style="margin-left: 40px;"&gt;&lt;A rel="user" href="https://community.cloudera.com/users/70417/chaimajandoubi576.html" nodeid="70417"&gt;@Jandoubi Chaima&lt;/A&gt; 1. when you say, EvaluateJSONPath doesn't works, what is the issue that you are facing? 2. Do you need your final data on HDFS in JSON format?  3. What is the query that you want in ReplaceText? DDL? DML?&lt;/P&gt;</description>
      <pubDate>Sat, 17 Mar 2018 14:55:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187384#M75926</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-03-17T14:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187385#M75927</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/70417/chaimajandoubi576.html" nodeid="70417" target="_blank"&gt;@Jandoubi  Chaima&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;By using EvaluateJsonPath processor you can extract all the values of the json keys and keep them as attributes of the flowfile.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;EvaluatejsonPath configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="64665-evaljson.png" style="width: 1822px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18727iA01A3637EE2D1EB6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="64665-evaljson.png" alt="64665-evaljson.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Keep the&lt;STRONG&gt; Destination&lt;/STRONG&gt; property as &lt;STRONG&gt;flowfile-attribute&lt;/STRONG&gt; and&lt;/P&gt;&lt;P&gt;add all the json keys that you are having in the json message same as shown in the above screenshot.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Change the below property:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Destination&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;flowfile-attribute&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;Add these new properties to the processor:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;header_noun
&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;$.header_noun&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;header_verb
&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;$.header_verb&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Once you complete configuring this evaluate json path processor then we can create insert statement in ReplaceText processor.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replace text processor Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Search Value&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;(?s)(^.*$)&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Replacement value&lt;/P&gt;&lt;PRE&gt;insert into &amp;lt;db-name&amp;gt;.&amp;lt;table-name&amp;gt; values('${header_verb}','${header_noun}'...);&lt;/PRE&gt;&lt;P&gt;Maximum Buffer Size&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;1 MB //change if the flowfile size is greater than 1mb&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Replacement Strategy&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Always Replace&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Evaluation Mode&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Entire text&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;in this processor we are going to build insert statements into hive table by adding all the extracted attribute values in the evaluatejson processor.&lt;/P&gt;&lt;P&gt;Then use PutHiveQl processor to execute all the insert statements.&lt;/P&gt;&lt;P&gt;(or)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Method2:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Easy/Best way to do is by using &lt;STRONG&gt;convert record processor, &lt;/STRONG&gt;even this convert record processor accepts array of json messages/objects&lt;STRONG&gt;,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As you are having json message use convert record processor with &lt;STRONG&gt;json reader and avro set writer&lt;/STRONG&gt;, in this convert record processor we are converting j&lt;STRONG&gt;son array of messages/objects into avro&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Then by using &lt;STRONG&gt;Convert AvroToOrc processor&lt;/STRONG&gt; we can convert the avro format to ORC format(as orc is optimized for tez execution engine).&lt;/P&gt;&lt;P&gt;Use &lt;STRONG&gt;PutHDFS processor&lt;/STRONG&gt; to store the data into &lt;STRONG&gt;HDFS directory&lt;/STRONG&gt; and create a hive table on top of this directory(you can use hive.ddl attribute from convert AVROtoORC processor to create table).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Flow:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;GetHDFS--&amp;gt;ValidateXML---&amp;gt;TransformXMLToJson---&amp;gt;JoltTransformation---&amp;gt;ConvertRecord---&amp;gt;ConvertAvroToORC---&amp;gt;PutHDFS&lt;/PRE&gt;&lt;P&gt;&lt;U&gt;in &lt;/U&gt;case if you want to create hive table in your flow it self then add new processors after putHDFS processor&lt;/P&gt;&lt;PRE&gt;Replacetext ---&amp;gt; PutHiveQL &lt;/PRE&gt;&lt;P&gt;References for convertrecord processor&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;create hive table references&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Load data into hive references&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/81749/what-is-the-best-approach-to-load-data-into-hive-u.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.hortonworks.com/questions/81749/what-is-the-best-approach-to-load-data-into-hive-u.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Let us know if you are facing any issues..!!&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 08:03:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187385#M75927</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T08:03:07Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187386#M75928</link>
      <description>&lt;P&gt;hello &lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; it works perfectly &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; &lt;/P&gt;&lt;P&gt;big thanks for you ^^&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 17:36:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187386#M75928</guid>
      <dc:creator>chaimajandoubi5</dc:creator>
      <dc:date>2018-03-19T17:36:36Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187387#M75929</link>
      <description>&lt;P&gt;Good morning  sir &lt;A rel="user" href="https://community.cloudera.com/users/66220/rsoni.html" nodeid="66220"&gt;@Rahul Soni&lt;/A&gt; ^^&lt;/P&gt;&lt;P&gt;thanks for replying me , Mr @Shu gave me the solution I was looking for !&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 17:39:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187387#M75929</guid>
      <dc:creator>chaimajandoubi5</dc:creator>
      <dc:date>2018-03-19T17:39:54Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to ingest data from HDFS witch contain XML files and then push them into Hive using apache Nifi workFlow !!</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187388#M75930</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; Hello &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; from the other side XD &lt;/P&gt;&lt;P&gt;Am so grateful sir , you helped me out !!&lt;/P&gt;&lt;P&gt;
Thanks a lot :D
I highly recommend you &lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 15:39:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-best-way-to-ingest-data-from-HDFS-witch-contain/m-p/187388#M75930</guid>
      <dc:creator>chaimajandoubi5</dc:creator>
      <dc:date>2018-03-27T15:39:55Z</dc:date>
    </item>
  </channel>
</rss>

