Member since
06-06-2016
13
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1863 | 06-23-2016 06:26 PM |
04-13-2020
04:31 AM
You have to part the content first as line by line utilizing Split Text Processor. Use regex to extricate values by utilizing Extract Text processor, it will results esteems as traits for the each stream record. Supplant content processor to supplant the traits as substance of the stream record LiteBlue.
... View more
03-17-2017
01:27 AM
2 Kudos
Hi @Sean Byrne I also had a similar question, but it's common within distributed systems to see many "part" file outputs. This is because you will typically have many partitions, across multiple nodes, writing to the same output directory (so interference is reduced). However, you can run a Spark job against this directory in order to create one single CSV file. Here's the code: # Use PySpark to read in all "part" files
allfiles = spark.read.option("header","false").csv("/destination_path/part-*.csv")
# Output as CSV file
allfiles.coalesce(1).write.format("csv").option("header", "false").save("/destination_path/single_csv_file/")
Another option would be to use format("memory") and then you could execute periodic in-memory queries against the Spark Stream. These queries could save the in-memory table to a single CSV (or other format). If I come across any way to output to a single CSV from Structure Streaming, I will be sure to post it. Hope this is helpful.
... View more
08-22-2016
05:01 PM
Here's the fix to the timestamp value truncations that I attempted to explain above: public static ColumnDescription from(final ResultSet resultSet) throws SQLException { final ResultSetMetaData md = resultSet.getMetaData(); List<String> columns = new ArrayList<>(); HashMap<String,Int> columncache = new HashMap<String,Int>(); // NEW - used to store column size, as per database service for (int i = 1; i < md.getColumnCount() + 1; i++) { columns.add(md.getColumnName(i)); columncache.put(md.getColumnName(i),md.getPrecision(i)); // NEW - get physical column size } final String columnName = resultSet.getString("COLUMN_NAME"); final int dataType = resultSet.getInt("DATA_TYPE"); //final int colSize = resultSet.getInt("COLUMN_SIZE"); final int colSize = columncache.get(columnName); // NEW the rest of the constructor methods code has been omitted.
... View more
07-13-2016
09:15 PM
1 Kudo
With MergeContent, it is possible to specify a Max Bin Age that will prevent a data starvation condition that prevents the latest data from being held in limbo. Accordingly, you can make a best effort to get an appropriately sized file to place in HDFS but not at the cost of data being held indefinitely.
... View more
06-06-2016
10:10 PM
Thanks a lot. I never noticed the "Prepend" setting in ReplaceText before but this worked a treat! Appreciate it!
... View more