About Shu_ashu

Shu_ashu · ‎10-05-2017

Hi @Foivos A, i tried a simple flow passing attributes to Execute stream command, sample shell script to list files in directory bash# cat hadoop.sh for filename in `hadoop fs -ls ${1}${2} | awk '{print $NF}' | grep .txt$ #| tr '\n' ' '` do echo $filename; done Run script:- bash#./hadoop.sh /user/yashu/ folder2 /user/yashu/folder2/part1.txt /user/yashu/folder2/part1_sed.txt This script takes 2 arguments and passes those to ${1},${2} above and listed out files in folder2 directory. When we are running this script in NiFi ExecuteStreamCommand processor configure that processor as In my flow every flowfile having dir_name,folder_name as attributes as values like dir_name having value as /user/yashu/ folder_name having value as folder2 Attributes for ff:- processor configs:- Make sure your NiFi run script is calling the original shell script(hadoop.sh) with 2 arguments as i mentioned below. bash# cat hadoop_run.sh /tmp/hadoop.sh $1$2 in my case i'm calling hadoop_run.sh from nifi which calls hadoop.sh script with two arguments as i mentioned above. Then only the ExecuteStreamCommand processor takes 2 arguments as you have mentioned in Command Arguments property in our case ${dir_name};${folder_name} and passes those arguments to script. How NiFi Executes Script:- When NiFi executes hadoop_run.sh script it calls hadoop.sh script which accepts 2 arguments as we are passing those arguments from NiFi ${dir_name}will be considered as $1;${folder_name} considered as $2 and lists all the files

Shu_ashu · ‎10-05-2017

@Sumit Sharma I think right now your flow looks like GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content) You need to have the below processors to get your desired result. Final Flow:- GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content)-->ExtractAttributes(to get contents as attributes)-->ReplaceText(to replace attribute values as content of ff)-->MergeContent(to merge the ff as one with header). Extract text processor:- After looking at your output you just want all the values of the content to be stored as seperate for this case first we need to extract contents of ff as attributes of ff. by adding new properties to the processor date as Date:\s+(.*)\s+(?=,) Message as Message:\s+(.*?)$ Receve as Receve:\s+(.*?)(,) sender as sender:\s+(.*?)(,) once we extract the contents of ff as attributes then we need to use ReplaceText Processor:- change Replacement Value to ${date} ${receiver} ${Message} ${sender} then change Replacement Strategy property to Always Replace config screenshot:- Input:- Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] output:- [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] Once we replace values then use Merge content processor:- To merge the flowfiles to one(depends on your requirement). Change the below properties Delimiter Strategy to Text Header to (as per your requirements and do shift+enter to insert new line) Date : Sender: Receiver Node Message: in my processor i kept minimum group size as 500 B , so this processor will waits until the queue size before merge content to 500 B and merges all the ff to one and gives the merged ff. Input:- in my case every ff is 170 B now so the processor waits for 3 ff then the queue size is 520B [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] Output:- your desired output 🙂 Date : Sender: Receiver Node Message: [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] Configs:- You can refer to below links to configure Merge content processor https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html https://community.hortonworks.com/questions/88199/issue-with-nifi-merge-content-files-stay-in-the-qu.html https://stackoverflow.com/questions/34958347/mergecontent-with-nifi-inconsistent-length Flow Screenshot:-

Shu_ashu · ‎10-05-2017

@Sumit Sharma, Use List File processor and configure processor as GetFile and connect success to FetchFile processor As ListFile processor keeps the state until what time stamp it has pulled files from that directory and only pulls the new files that got created in that directory only. if you want to see the state of ListFile processor right click on processor and click on view state button if you want to clear the state then click on clear state to your right on the screen. Keep FetchFile processor to default configurations as it gets ${absolute.path},${filename} attribute values from ListFile processor. Flow should be:- ListFile(sucess)---> FetchFile--->SplitText--->ReplaceText

Shu_ashu · ‎10-05-2017

Hi @Sumit Sharma, you can use replace text processor to extract and replace text as per your requirement. Change the search value property to:- (.+?)\s+:INFO.*Receiver Node\s+(\[.*\])\s+(?=,).*Sender Node\s+(\[.*\])\s+(?=,).*Message\s+(\[.*\])$ Change Replacement Value property to:- Date: $1 ,sender: $3,Receve: $2, Message: $4 ReplaceText processor Configs:- Input :- [16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] Output:- Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] So this processor works dynamically according to the ff and replaces the content with your specifications.

Shu_ashu · ‎10-04-2017

Hi @Shailesh Nookala, Keep Search Value config as is (?s)(^.*$) ReplaceText Configs:- We are preparing insert statement in this processor and change Change ReplacementStrategy property to AlwaysReplace Use insert statement and give destination table name and use the extracted attributes to replace the contents of values dynamically. change Replacement Value property to insert into sqlserver_tablename (id,name,salary) values (${id},${name},${salary}) above statement will work dynamically with the attributes of the ff if we are having id attribute value as 1 and name as abc salary as 1000 then this insert statement will be insert into sqlserver_tablename (id,name,salary) values (1,abc,1000) the same way this replace text processor prepares insert statements according to the attributes associated with the ff. ReplaceText Configs:- let me know if you still facing issues and please share more screenshots with configs and flow...!!!

Shu_ashu · ‎10-03-2017

@dhieru singh, Another way is, Use Ambari and click on HiveView as show in the below screenshot. then click on UploadTable and if your csv file is in local then click on choose file if you want to get column names from headers then click on the gear symbol after Filetype dropdown The table will gets all the column names from csv file headers. Select the database where do you want to create the table and change the table name if you want to change. Then click on UploadTable button located at left on the screen.

Shu_ashu · ‎10-03-2017

Hi @Narasimma varman, if we are creating files or directories having {/,:} characters then HDFS wont allow to do that but in local we can use colon. You can refer to below similar kind of issue. https://community.hortonworks.com/questions/139512/create-hdfs-folder-having-namecreate-hdfs-folder-h.html?childToView=139555#answer-139555 To resolve your issue:- Replace all colons with some other character and try to run again. ${filename}.${now():format("yyyy-MM-dd-HH_mm_ss.SSS'z'")}" (or) ${filename}.${now():format("yyyy-MM-dd-HHmmss.SSS'z'")}"

Shu_ashu · ‎10-03-2017

@Yair Ogen, If you are using Hortonworks distribution then its better to use ORC format as it is optimized for TEZ execution engine and if you are considering about file size then ORC is more compressed than Parquet. Best Practices of ORC in HCC :- https://community.hortonworks.com/articles/75501/orc-creation-best-practices.html Pros and Cons of Parquet format:- https://stackoverflow.com/questions/36822224/what-are-the-pros-and-cons-of-parquet-format-compared-to-other-formats Orc vs Parquet:- https://community.hortonworks.com/questions/2067/orc-vs-parquet-when-to-use-one-over-the-other.html

Shu_ashu · ‎10-02-2017

Thanks @Yair Ogen, 1.Please share the output of below command hadoop fs -ls /apps/hive/ 2.Run insert statement again and attach hiveserver2.log hiveserver2.log

Shu_ashu · ‎10-01-2017

@Yair Ogen can you once share screenshot after executing the below command hadoop fs -ls /apps/hive/warehouse/ and attach the hive logs from directory. /var/log/hive/ (or) Go to resource manager UI and attach the error logs. for reference:- https://community.hortonworks.com/questions/49759/viewing-logs-for-hive-query-executions.html

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Create files from GetHDFS processor flowfiles

Re: Read text file using GetFile Processor and Sp...

Re: Read text file using GetFile Processor and Sp...

Re: Read text file using GetFile Processor and Sp...

Re: I used GetFTP processor to get a CSV file from...

Re: Create Hive tables from CSV files

Re: Error in PutHDFS while adding Time to file

Re: Error loading csv file using hive

Re: Error loading csv file using hive

Re: Error loading csv file using hive