About sean_byrnee

Owe2n · ‎04-13-2020

You have to part the content first as line by line utilizing Split Text Processor. Use regex to extricate values by utilizing Extract Text processor, it will results esteems as traits for the each stream record. Supplant content processor to supplant the traits as substance of the stream record LiteBlue.

dzaratsian · ‎03-17-2017

Hi @Sean Byrne I also had a similar question, but it's common within distributed systems to see many "part" file outputs. This is because you will typically have many partitions, across multiple nodes, writing to the same output directory (so interference is reduced). However, you can run a Spark job against this directory in order to create one single CSV file. Here's the code: # Use PySpark to read in all "part" files allfiles = spark.read.option("header","false").csv("/destination_path/part-*.csv") # Output as CSV file allfiles.coalesce(1).write.format("csv").option("header", "false").save("/destination_path/single_csv_file/") Another option would be to use format("memory") and then you could execute periodic in-memory queries against the Spark Stream. These queries could save the in-memory table to a single CSV (or other format). If I come across any way to output to a single CSV from Structure Streaming, I will be sure to post it. Hope this is helpful.

charles1 · ‎08-22-2016

Here's the fix to the timestamp value truncations that I attempted to explain above: public static ColumnDescription from(final ResultSet resultSet) throws SQLException { final ResultSetMetaData md = resultSet.getMetaData(); List<String> columns = new ArrayList<>(); HashMap<String,Int> columncache = new HashMap<String,Int>(); // NEW - used to store column size, as per database service for (int i = 1; i < md.getColumnCount() + 1; i++) { columns.add(md.getColumnName(i)); columncache.put(md.getColumnName(i),md.getPrecision(i)); // NEW - get physical column size } final String columnName = resultSet.getString("COLUMN_NAME"); final int dataType = resultSet.getInt("DATA_TYPE"); //final int colSize = resultSet.getInt("COLUMN_SIZE"); final int colSize = columncache.get(columnName); // NEW the rest of the constructor methods code has been omitted.

apiri · ‎07-13-2016

With MergeContent, it is possible to specify a Max Bin Age that will prevent a data starvation condition that prevents the latest data from being held in limbo. Accordingly, you can make a best effort to get an appropriately sized file to place in HDFS but not at the cost of data being held indefinitely.

sean_byrnee · ‎06-06-2016

Thanks a lot. I never noticed the "Prepend" setting in ReplaceText before but this worked a treat! Appreciate it!

Online	Offline
Last Visited	‎12-14-2017 12:39 AM

Member Since	‎06-06-2016 09:48 PM
Last Visited	‎12-14-2017 12:39 AM
Posts	13
Kudos received	4

Cloudera Community

Re: Looking to use ReplaceText on nifi to convert ...

Re: Converting CSV To Avro with Apache NiFi

Re: Structured Streaming writestream append to fil...

Re: QueryDatabaseTable, importing based on postgre...

Re: Using NiFi as a Kafka consumer to write to HDF...

Re: Adding timestamp to flow file content