Member since
01-23-2016
51
Posts
41
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2059 | 02-18-2016 04:34 PM |
03-21-2018
02:01 PM
I ended up not using NiFi for this. Looking back I tried forcing a solution out of NiFi thst wasn’t a good fit. I spent several weeks and entirely too long trying to solve the most simple case of this project (formatting some text and dumping it to a db). I could certainly see NiFi being useful for moving source data files around from the folders I’m working with (copying, moving etc.) but doing any amount of logic or manipulation of anything but a happy path is extremely tedious and seemingly difficult to do. Knowing that I was going to have to do a lot more work on the data to make it even close to usable, I just scrapped NiFi and implement it in Python. After dealing with this data and running into edge cases over and over again that I wasn’t even aware about when I wrote this topic, the data IMO was just too dirty and had too many exceptions to deal with, with NiFi. On top of that this wasn’t just the import of the data, not even using it so I would have had to have another tool to actually process the data to put it into a usable form anyways. Appreciate the response. You took the time to respond so I figured it was reasonable to respond even though I didn’t end up using the solution.
... View more
01-05-2018
08:51 PM
I've googled everywhere for this and everything I run across its super complicated. It should be relatively simple to do. The recommendations show to look at the "Example_With_CSV.xml" from here. So given a flowfile thats a CSV. 2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True)" I need $date = 2017-09-20 23:49:49:38.637 $id = 162929511757 ... $instanceid = 36095 $comment = "Failed, to fit max attempts (1=>3), fit failing, entirely (Fit Failure=True)" OR $csv.date = ... $csv.id = ... ... $csv.instanceid = ... $csv.comment = .. Is there another easier option to do this besides RegEx? I can't stand to do anything with RegEx as how unreadable, and overly complicated they are. To me there should be a significantly easier way of doing this than with RegEx. https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates but it doesn't have anything in there related to actually getting the columns of each value out.
... View more
Labels:
- Labels:
-
Apache NiFi
01-05-2018
08:34 PM
There is no example in the "Working_With_CSV" template of how to extract each individual field into attributes.
... View more
01-04-2018
05:25 PM
1 Kudo
Thanks! That seems to work correctly. I'll mark this as the answer as it produces the answer I'm looking for.
... View more
01-03-2018
10:25 PM
2 Kudos
@Shu Thank you for the great detailed response. The first part does work but I don't think the regex will work for my
case. (Side bit, no fault of yours, I just absolutely despise regex as its
unreadable to me and extremely difficult to debug (if at all).) I should have mentioned this, but the only thing I know about the CSV
file is that there are X number of columns before the string. So I could see something like..
23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max, attempts,(1=>3), fit failing entrely,(FitFailure=True),
The only thing I know is that there are 13 columns (commas) before
the string and the string will always have a trailing "," (It has always
been the last column in the row from what I have seen). The other issue is I tried doing
(.*),
for all of the columns so I could then put it into a database query
to insert the data but the regex seems to blowup and not function with
so many columns (the original data has about 150 columns in it and I
just truncated it down here).
... View more
01-03-2018
04:46 PM
1 Kudo
I have a CSV file that is messy. I need to: 1. Get the date from the filename and use that as my date and append that to one of the columns. 2. Parse the CSV file to get the columns as the very last column is a string which has separators in the string ",". The data looks like this. Filename: ExampleFile_2017-09-20.LOG Content: 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True), 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,Command Measure, Targets complete - Elapsed: 76064 ms, The following is what will need to be inserted into the database: 2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max
attempts (1=>3), fit failing entirely (Fit Failure=True)" 2017-09-20 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,"Command Measure, Targets complete - Elapsed: 76064 ms" Would I need to do this inside of NiFi or some external script by calling some type of ExecuteScript?
... View more
Labels:
- Labels:
-
Apache NiFi
05-25-2016
05:34 AM
what do I need to set in hive-env.sh? It seems that anything I touch it gets overwritten. This has to be a bug in ambari where it won't save the hive.heapsize value. How can I get it to persist?
... View more
05-25-2016
04:49 AM
The configuration of hive.heapsize does not exist in my hive-site.xml for some reason and whenever I add it to the file it keeps getting overwritten.
... View more
05-25-2016
04:09 AM
@Divakar AnnapureddyCorrect but if you look at my comments i posted a picture and it shows, it is changed to 12GB in the UI. The services have been restarted (complete server has been restarted).
... View more
05-25-2016
03:27 AM
hive 24964 0.2 1.7 2094636 566148 ? Sl 17:03 0:56 /usr/lib/jvm/ja va-1.7.0-openjdk-1.7.0.91.x86_64/bin/java -Xmx1024m -Dhdp.version=2.3.2.0-2950 - Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/ log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0- 2950/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.librar y.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/2.3.2. 0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.prefe rIPv4Stack=true -Xmx1024m -XX:MaxPermSize=512m -Dhadoop.security.logger=INFO,Nul lAppender org.apache.hadoop.util.RunJar /usr/hdp/2.3.2.0-2950/hive/lib/hive-serv ice-1.2.1.2.3.2.0-2950.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hca talog-core.jar -hiveconf hive.metastore.uris= -hiveconf hive.log.file=hiveserve r2.log -hiveconf hive.log.dir=/var/log/hive So I can see that it is set at 1024m, however it is set to some really large value. http://imgur.com/3oXfpPj
... View more