Created 08-08-2017 06:04 PM
Hi,
1. step we ingest SYSLOG messages with the ListenSyslog and PutHDFS in our DataLake.
2. step ingest with PXF external table technology into HAWQ.
CREATE EXTERNAL TABLE ext_syslog_hist
(message TEXT)
LOCATION ('pxf://xxx.xxx.com:8020/apps/nifi/syslog/history/*?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E'\t');
Every minute there are thousands of small (100 bytes) files created in the HDFS.
So I'm looking for a way to ingest the incoming ListenSyslog directly into a HAWQ table.
Regards
Timo
Created 08-08-2017 06:13 PM
You could use PutSQL to issue INSERT statements directly against HAWQ. Assign the data item values to attributes, as described in the PutSQL documentation. Then put the INSERT statement in the content of the Flow File, with '?' placeholders for the individual values.
Created 08-08-2017 06:13 PM
You could use PutSQL to issue INSERT statements directly against HAWQ. Assign the data item values to attributes, as described in the PutSQL documentation. Then put the INSERT statement in the content of the Flow File, with '?' placeholders for the individual values.
Created 08-09-2017 04:02 PM
Thanks,
this was really easy.
I just generating "insert into table" statements including the Flow File with the ReplaceText processor and connect it with the PutSQL processor.
regards
Timo
Created 08-09-2017 07:18 PM
Is this equivalent to: INSERT INTO <table> (col1, col2, col3) VALUES (?, ?, ?);
Created 08-08-2017 07:47 PM
You could also use MergeContent before PutHDFS so that you can merge together all of the small files based on size or time before writing to HDFS.
Created 08-11-2017 11:52 AM
Hi Bryan,
how can i merge the FlowData Files to a multiline File?
Using the MergeCentent processor i got a concatenated line.
input:
1 FlowData: <30>Aug 11 06:27:26 xxx.xxx.com systemd[28892]: Stopping Timers.
2 FlowData:<30>Aug 11 06:27:15 xxx.xxx.com systemd[24517]: Stopping Paths
etc...
MergeContent output:
<30>Aug 11 06:27:26 xxx.xxx.com systemd[28892]: Stopping Timers.<30>Aug 11 06:27:15 xxx.xxx.com systemd[24517]:Stopping Paths
I need the following multiline structure:
<30>Aug 11 06:27:26 xxx.xxx.com systemd[28892]: Stopping Timers.
<30>Aug 11 06:27:15 xxx.xxx.com systemd[24517]:Stopping Paths
any solution?
regards
Timo
Created 08-11-2017 12:22 PM
Just set the "Delimiter Strategy" in MergeContent to "Text" and then set the "Demarcator" property to shift+enter to make a new line. It will insert the demarcator between each flow file that it merges.
Created 08-08-2017 08:18 PM
You definitely do not want to execute an INSERT for every row of data. That will be really, really slow.
I suggest you use the gpfdist protocol to load the log files directly into HAWQ. You start a gpfdist process where the files are located, create an external table that uses the gpfdist location, and then execute a single INSERT statement to load all of the data from the files into your HAWQ table.
Created 08-09-2017 07:51 PM
The PutSQL option is by far the worst option. This is the wrong answer. Please, don't do that. Hellmer clearly doesn't use HAWQ.
I wrote a blog post over a year ago on this subject. The singleton INSERT solution that is common for OLTP databases is the absolute worst solution for HAWQ. In other words, "INSERT INTO table_name (columns...) VALUES (...);"
Here is my blog post covering loading data into HAWQ: http://www.pivotalguru.com/?p=1425
Created 08-11-2017 01:44 PM
Great, works !!!
Thanks!