Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11304 | 04-15-2020 05:01 PM | |
| 7198 | 10-15-2019 08:12 PM | |
| 3172 | 10-12-2019 08:29 PM | |
| 11678 | 09-21-2019 10:04 AM | |
| 4408 | 09-19-2019 07:11 AM |
03-02-2018
10:28 AM
2 Kudos
@Simon Jespersen
You are having two forward slashes in your curl call after https://localhost:9091, just use one forward slash and run again the your curl call curl -k -i -H 'Content-Type: application/json'-XPUT -d '{"id":"cdb54c9a-0158-1000-5566-c45ca9692f85","state":"RUNNING"}' https://localhost:9091/nifi-api/flow/process-groups/a9d5c45f-015b-1000-0000-00006d9844d3 If you are still facing some issues then follow the below steps to start/stop processor groups in kerberos HDF2.1.1. if our HDF is Kerberised then we need to pass our access token with CURL api call. Steps to Start/Stop Processor Group:- 1.First do kinit in your NiFi node bash$ kinit 2.Check the validity of the kerberos ticket and make sure your your ticket is valid bash$ klist 3. Now create access token bash$ token=`curl -k -X POST --negotiate -u : https://localhost:9091/nifi-api/access/kerberos` https://localhost:9091/nifi/ 4.Use the created token in your curl call to start processor group bash$ curl -k --header "Authorization: Bearer $token" -i -H 'Content-Type: application/json' -XPUT -d'{"id":"cdb54c9a-0158-1000-5566-c45ca9692f85","state":"RUNNING"}' https://localhost:9091/nifi-api/flow/process-groups/cdb54c9a-0158-1000-5566-c45ca9692f85 5.Use the created token in your curl call to stop processor group bash$ curl -k --header "Authorization: Bearer $token" -i -H 'Content-Type: application/json' -XPUT -d'{"id":"cdb54c9a-0158-1000-5566-c45ca9692f85","state":"STOPPED"}' https://localhost:9091/nifi-api/flow/process-groups/cdb54c9a-0158-1000-5566-c45ca9692f85
... View more
02-28-2018
12:46 PM
2 Kudos
@Gayathri Devi
Could you try with below query as we are reading 2018-02-27T02:00 value and converting as timestamp value. Query:- hive> select from_unixtime(unix_timestamp('2018-02-27T02:00',"yyyy-MM-dd'T'hh:mm"),'yyyy-MM-dd hh:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2018-02-27 02:00:00 |
+----------------------+--+ (or) By using regexp_replace function we can replace T in your timestamp value hive> select regexp_replace('2018-02-27T02:00','T',' ');
+-------------------+--+
| _c0 |
+-------------------+--+
| 2018-02-27 02:00 |
+-------------------+--+ And use concat function to add missing :00 value to make above value as hive timestamp. hive> select concat(regexp_replace('2018-02-27T02:00','T',' '),":00");
+----------------------+--+
| _c0 |
+----------------------+--+
| 2018-02-27 02:00:00 |
+----------------------+--+
... View more
02-28-2018
03:39 AM
1 Kudo
@Gayathri Devi
You can use INPUT__FILE__NAME(gives all input filenames of the table) virtual column and construct your query then store the results of your query to final table. You need to create a temp table and keep your akolp9app1a_170905_0000.txt file in that table location. Then use hive> select INPUT__FILE__NAME from table; //this statement results your akolp9app1a_170905_0000.txt filename
+---------------------------------------------------------------------------------+--+
| input__file__name |
+---------------------------------------------------------------------------------+--+
| /apps/hive/warehouse/sales/akolp9app1a_170905_0000.txt |
+---------------------------------------------------------------------------------+--+ So then you can use all your string functions like substring on the input_file_name filed and keep your hostname,date fileds extracted from the input__file__name field. hive> select substring(INPUT__FILE__NAME,20,30) hostname,substring(INPUT__FILE__NAME,40,50) `date` from table; Then you can have final table that you can insert the above select statement hostname,date values. hive> insert into finaltable select substring(INPUT__FILE__NAME,20,30) hostname,substring(INPUT__FILE__NAME,40,50) `date` from table; For more references:- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
... View more
02-28-2018
02:14 AM
@Jyoti Ambi Instead of doing all this convertrecord processor choose either of below methods. Method1:- Executesql Processor properties SQL Query:- select MAX(CREATE_DATE) CREATE_DATE from one table Then use ConvertAvrotoJSON processor so we are converting Avro data to json Configs:- Your data would be like this [{"CREATE_DATE": "2018-10-12 09:09:09"}] Then use EvaluateJsonPath Processor to get the create_date value as flowfile content Configs:- Output flowfile content from evaljsonpath processor would be 2018-10-12 09:09:09 Then you can use PutFile processor to store your file into Local directory. Flow:- (or) Method2:- If you want header while keeping file into your directory, then in EvaluateJsonpath processor change the property Destination
flowfile-attribute Then add ReplaceText processor Configs:- Now we are creating new contents of flowfile by keeping header as CREATE_DATE and in new line we are keeping our create_date attribute value (i.e. 2018-10-12 09:09:09) Output:- CREATE_DATE 2018-10-12 09:09:09 Then use PutFile processor to store above file into your local. Flow:- Executesql --> ConvertAvrotoJSON --> EvaluateJSONpath(destination as flowfile attribute) --> Replace text -->putfile
... View more
02-27-2018
03:37 AM
@Jyoti Ambi I think your Avro Reader configs looks correct and in CsvSetwriter change configs as mentioned in below screenshot Schema Access Strategy as we are ingeriting the record schema from content of flowfile so i have setup that. As i haven't mentioned any Schema Registry value because we don't want to get schema from any of the registries as we are having schema available with record we are inheriting that record schema. Avro Reader Configs:- If you are still having issues then share us sample data like 10 records in csv(with header) (or) json format, so that we can recreate your scenario and help you out to solve your issue.
... View more
02-26-2018
08:39 PM
@YoungHeon Kim As ExecuteSQL processor SQL select query property supports expression language so you can use select * from tmp_table where std_date='${now():format('yyyy-MM-dd')}' this query is similar to select * from tmp_table where std_date='2018-02-26' For more details refer to https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
... View more
02-23-2018
01:52 PM
@Jyoti Ambi ExecuteSQL processor always return results in Avro Format, so once you get results in avro format then you need to use ConvertRecord processor
Record Reader --> Avro Reader //reads the incoming avro format flowfile contents
Record Writer --> CsvRecordSetWriter //write the output results in csv format then use PutFile processor to store the output success flowfiles from ConvertRecord processor. To configure this ConverRecord processor please refer to below links https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html https://community.hortonworks.com/questions/167066/how-to-split-the-csv-files-into-multiple-files-bas.html 2.Wait and Notify processors are work together i.e wait processor Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor. When a matching release signal is identified, a waiting FlowFile is routed to the 'success' relationship, with attributes copied from the FlowFile that produced the release signal from the Notify processor. How wait and notify processors works? http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/
... View more
02-21-2018
05:52 AM
1 Kudo
@Gopal Mehakare
you can use either of execute stream command processor which accepts incoming connections (or) Execute process which won't accept incoming connections based on your requirements to execute linux commands Execute Stream Command configs:- let's consider your input flowfile content is as follows hi
hcc
nifi output from Execute stream command processor would be just hi because we are executing bash command head -1 on the input flowfile content and output stream relation will transfer hi as new content of flowfile content. Output:- hi Execute Process configs:- The Success relation flowfile content will have just hi in it, as we are executing echo and this processor can run on own(no need of any incoming connections). Output:- hi For More reference:- https://community.hortonworks.com/questions/150122/zip-folder-using-nifi.html https://stackoverflow.com/questions/42443101/nifi-how-to-reference-a-flowfile-in-executestreamcommand
... View more
02-21-2018
12:04 AM
2 Kudos
@Mark You need to use Regex serde while creating hive table and matching regex to capture the fields that you need to have them in same group. Some references how to create regex serde tables https://community.hortonworks.com/articles/58591/using-regular-expressions-to-extract-fields-for-hi.html https://stackoverflow.com/questions/31008371/hive-using-regexserde-to-define-input-format https://stackoverflow.com/questions/9102184/regex-for-access-log-in-hive-serde
... View more
02-18-2018
04:16 PM
@Mark
As you are trying to insert values having semicolon(;) in it, hive thinks semi colon will be end of statement even if you escape semicolon with back slash(\). To insert values with semicolon use unicode for semicolon \u003B in your insert values statement and back slash to escape /,space,). Insert statement:- hive> insert into semicolon values ('Mozilla\/5\.0\ \(iPhone\u003B\ CPU\ iPhone\ OS\ 5_0\)');
hive> select * from semicolon;
+------------------------------------------+--+
| a |
+------------------------------------------+--+
| Mozilla/5.0 (iPhone; CPU iPhone OS 5_0) |
+------------------------------------------+--+ (or) keep your data file into HDFS directory and create semicolon table with string datatype, pointing to that HDFS directory. Semicolon Table reading from HDFS directory:- hive> select * from semicolon;
+------------------------------------------+--+
| a |
+------------------------------------------+--+
| Mozilla/5.0 (iPhone; CPU iPhone OS 5_0) |
+------------------------------------------+--+ Results will be same from both ways.
... View more