Member since
06-09-2017
6
Posts
0
Kudos Received
0
Solutions
08-31-2017
10:22 PM
Hello I can query using our JDBC driver to redshift just fine, but when I try a truncate using the same pool, it won't allow the query and errors with a note saying its not possible with "auto commit turned on" - I can not see how to turn auto commit off.
... View more
Labels:
08-30-2017
04:26 AM
Found the answer here: https://www.youtube.com/watch?v=bRKw5y_tx5E
... View more
08-29-2017
09:06 PM
and put it in what setting? i put the public DNS in the advertised listeners settings, and it no worky.
... View more
08-29-2017
08:46 PM
Hello, I have read every piece of documentation on this topic, and can't seem to find the answer. I have a kafka 0.10.2 cluster (currently 3 nodes, soon to be 40) installed via Ambari. It works when connecting via the local/private subnet (nifi -> kafka). When I try and run a producer from our office (cluster is in EC2), it doesn't work, because the bootstrap server meta list returns back internal hostnames of the brokers. From what I have read, it requires the setting 'advertised.listeners', when I add this setting like this: ```advertised.listeners="PLAINTEXT://host.name.here:port,PLAINTEXT://host2.name.here:port,PLAINTEXT://host3.name.here:port"``` Kafka brokers will not start now. I simply need to be able to produce and consume from kafka from outside of EC2. What am I overlooking? Thank you!
... View more
Labels:
06-20-2017
04:18 PM
Hello, I am trying to get this expression to work, and having a little trouble: ${sourcefilename:toLower():replaceAll(' ','_'):replaceAll('.xlsx',${allDelineatedValues("_",${sheetname},".csv"):join('')})} Can anyone spot the error offhand, the error log doesn't give me a specific point of failure. Thank you.
... View more
Labels:
06-09-2017
09:38 PM
Hello. I am new to Nifi and spark specifically as it relates to this request. So be kind 🙂 . I have dozens of files every week with 1-2 million rows each. The rows in the files are sometimes new, and sometimes updates of previous records (with no unique key, but a compound key can be created from multiple columns). I have this situation for 3 groups of files, which, at the end of consumption end up being joined together into 1 large table. For the sake of this post, lets call those groups of files, SOLD, PLACED, COMPLETE, each group having multiple files, with millions of rows. I do not have any control over the build/structure (column name/order) of the inbound files. They are from a 3rd party. With-in each group, the file columns are frequently not consistently named, or in the same order, but they always contain the same fields contextually (meaning, the column may be named wrong, but the data is correct in the column). Using NIFI I can move the files from FTP to S3 easily. Planning to "Upsert" the rows of each group into their own table, then join the 3 tables at the end , I started with using a redshift COPY command to ingest the files into redshift. This works until the columns of the files are out of order or incomplete. I've tried branching on "retry" in NIFI to adapt to each file, but the variations will be too many to manage that way. I need a way to basically populate a complete "object" prior to "upsert" that I can match columns/attributes on a regex (to match the varying column names), to a fixed schema. I was also thinking, perhaps I eliminate redshift, "upsert" the files with Spark, then join them with spark, and save out the output to S3 as 1 large file. Any suggestion on how you would tackle this problem as efficiently as possible? Thank you in advance for your help!
... View more