About edmund_prout

edmund_prout · ‎09-06-2017

This worked nicely. Thanks Yash!

edmund_prout · ‎08-29-2017

I have a text file I'm reading into a Nifi flow, which consists of key value pairs that look like the following: status:"400" body_bytes_sent:"174" referer:"google.com" user_agent:"safari" host:"8.8.4.4" query_string:"devices" status:"400" body_bytes_sent:"172" referer:"yahoo.com" user_agent:"Chrome" host:"8.8.4.3" query_string:"books" Currently the tailfile processor is successfully reading these files as they are created and append to. However, I want to output them as avro files to Kafka. Any idea what processor(s) I need to convert these text files into avro format in my flow? What would the configuration look like for these processors?

edmund_prout · ‎04-18-2017

Does anybody have a workaround for this issue? Restarting the nifi processor manually is not feasible for a flow that should run "lights out".

edmund_prout · ‎03-28-2017

Answering my own question. This was happening because of the load balancer between Nifi and splunk. Hitting splunk directly resolved the issue.

edmund_prout · ‎03-28-2017

Is it possible that complex splunk queries are too much for the getsplunk processor? These queries are a few hundred lines long, but run fine in the splunk GUI. Error is below, and recommendations on how to troublehshoot? 10:11:06 ESTERRORf46c3d86-5571-146c-a8ef-071da5f520e6 denatb3wlwbl08.cloud.myco.org:8443GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6]: java.net.SocketException: Connection reset: org.apache.nifi.processor.exception.ProcessException: IOException thrown from GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6]: java.net.SocketException: Connection reset

edmund_prout · ‎09-28-2016

I'm attempting to write a parquet file to an S3 bucket, but getting the below error: py4j.protocol.Py4JJavaError: An error occurred while calling o36.parquet. : java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:453) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) The line of python code that fails is: df.write.parquet("s3a://myfolder/myotherfolder") The same line line of code works successfully if I write it to HDFS instead of S3: df.write.parquet("hdfs://myfolder/myotherfolder") I'm using spark-2.0.2-bin-hadoop2.7 and aws-java-sdk-1.11.38 binaries. Right now I'm running it interactively in PyCharm on my Mac.

edmund_prout · ‎08-15-2016

Thanks Bryan, thats just what I was looking for!

edmund_prout · ‎08-15-2016

I have an NFS read only Share that contains thousands of unique files (all with unique names) which I want to process with Nifi. New files are constantly added to this share by another process. Is there any way to keep track of which files Nifi already processes in a prior run, so I don't process them again? I cannot make any changes to files on files on this NFS share (cannot delete or rename them).

edmund_prout · ‎08-02-2016

I went with your answer first Pierre, and it worked. Thanks!

edmund_prout · ‎08-01-2016

I'm building a nifi flow with the nifi GUI. As part of the flow I have series of flat files I'm ingesting, which contains lines that I don't want in my data flow. These lines all start with the hash/pound symbol #. Any ideas how to filter these lines out? I was thinking a routeoncontent processor, but I'm not sure how to make it filter out lines.

Online	Offline
Last Visited	‎12-08-2017 06:57 PM

Member Since	‎05-27-2016 01:37 PM
Last Visited	‎12-08-2017 06:57 PM
Posts	15
Kudos received	4

Cloudera Community

Re: getsplunk processor fails on complex queries?

Re: Nifi convert text file consisting of key value...

Nifi convert text file consisting of key value pai...

Re: Cron scheduling strategy with GetSplunk fails ...

Re: getsplunk processor fails on complex queries?

getsplunk processor fails on complex queries?

Spark S3 write failed

Re: Nifi track files already processed?

Nifi track files already processed?

Re: NiFi routeoncontent processor

NiFi routeoncontent processor