Member since
05-27-2016
15
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1135 | 03-28-2017 05:55 PM |
09-06-2017
07:59 PM
1 Kudo
This worked nicely. Thanks Yash!
... View more
08-29-2017
12:12 AM
1 Kudo
I have a text file I'm reading into a Nifi flow, which consists of key value pairs that look like the following: status:"400" body_bytes_sent:"174" referer:"google.com" user_agent:"safari" host:"8.8.4.4" query_string:"devices" status:"400" body_bytes_sent:"172" referer:"yahoo.com" user_agent:"Chrome" host:"8.8.4.3" query_string:"books" Currently the tailfile processor is successfully reading these files as they are created and append to. However, I want to output them as avro files to Kafka. Any idea what processor(s) I need to convert these text files into avro format in my flow? What would the configuration look like for these processors?
... View more
Labels:
- Labels:
-
Apache NiFi
04-18-2017
02:26 PM
Does anybody have a workaround for this issue? Restarting the nifi processor manually is not feasible for a flow that should run "lights out".
... View more
03-28-2017
05:55 PM
1 Kudo
Answering my own question. This was happening because of the load balancer between Nifi and splunk. Hitting splunk directly resolved the issue.
... View more
03-28-2017
03:21 PM
1 Kudo
Is it possible that complex splunk queries are too much for the getsplunk processor? These queries are a few hundred lines long, but run fine in the splunk GUI. Error is below, and recommendations on how to troublehshoot? 10:11:06 ESTERRORf46c3d86-5571-146c-a8ef-071da5f520e6
denatb3wlwbl08.cloud.myco.org:8443GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6]: java.net.SocketException: Connection reset: org.apache.nifi.processor.exception.ProcessException: IOException thrown from GetSplunk[id=f46c3d86-5571-146c-a8ef-071da5f520e6]: java.net.SocketException: Connection reset
... View more
Labels:
- Labels:
-
Apache NiFi
09-28-2016
06:47 PM
I'm attempting to write a parquet file to an S3 bucket, but getting the below error: py4j.protocol.Py4JJavaError: An error occurred while calling o36.parquet.
: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:453)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745) The line of python code that fails is: df.write.parquet("s3a://myfolder/myotherfolder") The same line line of code works successfully if I write it to HDFS instead of S3: df.write.parquet("hdfs://myfolder/myotherfolder") I'm using spark-2.0.2-bin-hadoop2.7 and aws-java-sdk-1.11.38 binaries. Right now I'm running it interactively in PyCharm on my Mac.
... View more
Labels:
- Labels:
-
Apache Spark
08-15-2016
02:12 PM
I have an NFS read only Share that contains thousands of unique files (all with unique names) which I want to process with Nifi. New files are constantly added to this share by another process. Is there any way to keep track of which files Nifi already processes in a prior run, so I don't process them again? I cannot make any changes to files on files on this NFS share (cannot delete or rename them).
... View more
Labels:
- Labels:
-
Apache NiFi
08-01-2016
09:20 PM
I'm building a nifi flow with the nifi GUI. As part of the flow I have series of flat files I'm ingesting, which contains lines that I don't want in my data flow. These lines all start with the hash/pound symbol #. Any ideas how to filter these lines out? I was thinking a routeoncontent processor, but I'm not sure how to make it filter out lines.
... View more
Labels:
- Labels:
-
Apache NiFi