Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

Highlighted

[Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

New Contributor

Nifi: 1.11.3

Ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)

 

Hello, I 'm using ListS3 - FetchS3 to get objects on Ceph cluster. I'm using config:
ListS3:

  • Bucket: test-empty-bucket01.
  • Region: US West (Oregon).
  • Write Object Tags: False.
  • Write User Metadata: False.
  • Communications Timeout: 30 secs.
  • Endpoint Override URL: http://localhost:12345/ (My Ceph cluster)
  • Use Versions: false.
  • List Type: List Objects V1.
  • Minimum Object Age: 0 sec.
  • Requester Pays: False.

FetchS3:

  • Bucket: test-empty-bucket01.
  • Object Key: ${filename}.
  • AWS GovCloud (US).
  • Communications Timeout: 30 secs.
  • Endpoint Override URL: http://localhost:12345/ (My Ceph cluster).
  • Requester Pays: False.

My issues:

  • When I use List Objects V2, ListS3 was not save current state file.
  • When I use this config, it run very fast. But after 1-2 hours, ListS3 put issue:

2020-03-04 13:30:52,521 ERROR [Timer-Driven Process Thread-3] org.apache.nifi.processors.aws.s3.ListS3 ListS3[id=a3675f45-0170-1000-a9c4-825011327395] ListS3[id=a3675f45-0170-1000-a9c4-825011327395] failed to process session due to com.amazonaws.SdkClientException: Unable to execute HTTP request: Software caused connection abort: recv failed; Processor Administratively Yielded for 1 sec: com.amazonaws.SdkClientException: Unable to execute HTTP request: Software caused connection abort: recv failed
com.amazonaws.SdkClientException: Unable to execute HTTP request: Software caused connection abort: recv failed
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4866)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:881)
at org.apache.nifi.processors.aws.s3.ListS3$S3ObjectBucketLister.listVersions(ListS3.java:464)
at org.apache.nifi.processors.aws.s3.ListS3.onTrigger(ListS3.java:308)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Software caused connection abort: recv failed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1297)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
... 25 common frames omitted

Can you help me?

Thank you very much.

4 REPLIES 4

Re: [Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

Master Guru

@LunaLua 

 

I know next to nothing about Ceph, but the exception being thrown by the client (ListS3 processor) identifies issue as:

Caused by: java.net.SocketException: Software caused connection abort: recv failed


This points at the server side as having closed the connection unexpectedly.  I would suggest looking at the Ceph logs to see what exception(s) are being  thrown on that side around the same time as you see the exception in NiFi.  Perhaps that can provide you with more context around what is going wrong here.

 

Hopefully there are other community members who know more about Ceph or maybe have used the Amazon SDK to interface with Ceph who can provide even more insight.

 

Hope this helps you,

Matt

Highlighted

Re: [Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

Super Guru
Highlighted

Re: [Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

New Contributor

I met new issue in Minio Server. 

Nifi ListS3:

- Bucket: test-minio

- Region: US West (Oregon)

- Communications Timeout: 30 secs

- Endpoint Override URL: http://107.113.193.160:9000

- List Type: List Objects V1

 

Issue (On Nifi logs):

2020-03-30 10:22:43,453 WARN [Timer-Driven Process Thread-11] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ListS3[id=15549dd4-0171-1000-5c78-0061e4b538a5] due to uncaught Exception: com.amazonaws.SdkClientException: Unable to execute HTTP request: Read timed out
com.amazonaws.SdkClientException: Unable to execute HTTP request: Read timed out
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)

....

Caused by: java.net.SocketTimeoutException: Read timed out

 

Highlighted

Re: [Nifi ] [Ceph] [S3] Nifi miss some file when connect Ceph by S3 interface

Super Guru

Was MiNiO running?   Did it crash?   Not run on supported ports?  Need admin permissions?  Reboot?   Firewall something blocking it.

 

This error is either it is down or needs HTTPS.

 

can you connect from amazon s3 client or telnet on that port.   wireshark debugging?

Don't have an account?
Coming from Hortonworks? Activate your account here