About AlohaDecember

AlohaDecember · ‎05-04-2022

When I run the above spark application with zeppelin in Yarn cluster with cluster mode, I get the following error: Where may be the problem? Thanks

AlohaDecember · ‎04-26-2022

Hi guys, I am using nifi to connect to netezza to get data from here save to hdfs. I'm configuring the DBCPConnectionPool as below , and getting the data with the following statement: but I get the following error: 2022-04-26 17:03:06,530 WARN [Timer-Driven Process Thread-6] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ExecuteSQLRecord[id=504843e5-0180-1000-0000-00006597605a] due to uncaught Exception: java.lang.AbstractMethodError: org.netezza.sql.NzConnection.isValid(I)Z java.lang.AbstractMethodError: org.netezza.sql.NzConnection.isValid(I)Z at org.apache.commons.dbcp2.DelegatingConnection.isValid(DelegatingConnection.java:897) at org.apache.commons.dbcp2.PoolableConnection.validate(PoolableConnection.java:270) at org.apache.commons.dbcp2.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:630) at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:118) at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:665) at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:544) at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:753) at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:440) at org.apache.nifi.dbcp.DBCPService.getConnection(DBCPService.java:55) at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:87) at com.sun.proxy.$Proxy131.getConnection(Unknown Source) at org.apache.nifi.processors.standard.AbstractExecuteSQL.onTrigger(AbstractExecuteSQL.java:236) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) I don't know why and how to fix this. Can anyone help me? Thank you very much.

AlohaDecember · ‎12-26-2018

I got it, thank you very much 😄

AlohaDecember · ‎12-24-2018

Thank you very much for helping me, but I have some questions to help: 1. If the files are not moved to another folder (like questions 1 and 2 I mentioned), when the folder is too many files, for example 1 billion files, the server is full, I have to do that what? Maybe I have to reconfigure with another spool folder? 2. This is the configuration file I wanted to mention in question 5 # Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = source1 tier1.channels = channel1 tier1.sinks = sink1 # For each source, channel, and sink, set # standard properties. # source details tier1.sources.source1.type = spooldir tier1.sources.source1.spoolDir = /data/diem tier1.sources.source1.fileHeader = false tier1.sources.source1.fileSuffix = .COMPLETED tier1.sources.source1.channels = channel1 tier1.sources.source1.interceptors = i1 tier1.sources.source1.interceptors.i1.type = regex_extractor tier1.sources.source1.interceptors.i1.regex = \\[(.*?)\\] tier1.sources.source1.interceptors.i1.serializers = s1 tier1.sources.source1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer tier1.sources.source1.interceptor.serializers.s1.name = timestamp tier1.sources.source1.serializers.s1.pattern = yyyy-MM-dd HH:mm:ss # channel details tier1.channels.channel1.type = file tier1.channels.channel1.capacity = 200000 tier1.channels.channel1.transactionCapacity = 1000 # sink details tier1.sinks.sink1.type = HDFS tier1.sinks.sink1.fileType = DataStream tier1.sinks.sink1.hdfs.writeFormat = Text tier1.sinks.sink1.channel = channel1 tier1.sinks.sink1.hdfs.path = hdfs://localhost:8020/user/cloudera/testFolder/%y-%m-%d/%H%M/%S tier1.sinks.sink1.round = true tier1.sinks.sink1.roundValue = 10 tier1.sinks.sink1.roundUnit = minute tier1.sinks.sink1.hdfs.rollSize = 268435456 tier1.sinks.sink1.rollInterval = 0 tier1.sinks.sink1.hdfs.batchSize = 10000 And this is an error in the log file 2018-12-24 11:56:03,065 ERROR org.apache.flume.sink.hdfs.HDFSEventSink: process failed java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:251) at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:460) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:368) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) at java.lang.Thread.run(Thread.java:745) 2018-12-24 11:56:03,069 ERROR org.apache.flume.SinkRunner: Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:451) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:251) at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:460) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:368) ... 3 more And once again thank you for helping me answer these questions

AlohaDecember · ‎12-23-2018

Hi, I wanted to use Flume to send a large amount of files to hadoop and I had the idea of using spool, but I have some questions like this: 1. When sending files to hadoop, the files in the spool are not moved anywhere, which makes me wonder if there is a new file in the spool, how does Flume recognize the old and new files? 2. How does Flume after uploading the file to hadoop, will the files in the spool be moved to another folder? Or does Flume have a mechanism to back up files? 3. I know that Flume has some properties that help work with regex, but I don't know if Flume supports sending files to hadoop and sorting those files into regex-based directories? If so, how do I do it? 4. Does Flume support sending files to hadoop and categorizing them into directories based on the date sent? (I have read that part in HDFS Sink but when I tried it failed) 5. While using Flume to send files to hadoop, can I fix the file contents such as adding file names into the data stream, or changing the ";" into "|"? 6. Can I use any API, or any tool to monitor Flume file transfer to hadoop? For example, during file transfer, see how many files have been transferred to hadoop or how many files have been successfully submitted and how many files sent to hadoop failed. 7. Does Flume record transaction logs with hadoop? For example, how many files have been uploaded to hadoop, ... I know that I asked too much, but I am really confused with Flume and I really need your help. Look forward to your help. Thanks

AlohaDecember · ‎12-18-2018

Yeah, I did, tks 😄

AlohaDecember · ‎12-17-2018

Thank you very much, I solved my problem

AlohaDecember · ‎12-14-2018

You mean log in file flume.log in folder flume-ng? Because I don't see the flume-nd

AlohaDecember · ‎12-14-2018

Hi, I want to use flume to send text file to hdfs, I changed Configuration File in Flume service in Cloudera Manager as follows: # Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = source1 tier1.channels = channel1 tier1.sinks = sink1 # For each source, channel, and sink, set # standard properties. # source details tier1.sources.source1.type = spooldir tier1.sources.source1.spoolDir = /data/diem tier1.sources.source1.fileHeader = false tier1.sources.source1.basenameHeader = true tier1.sources.source1.fileSuffix = .COMPLETED tier1.sources.source1.thread = 4 tier1.sources.source1.interceptors = newint tier1.sources.source1.interceptors.newint.type = timestamp tier1.sources.source1.channels = channel1 # channel details tier1.channels.channel1.type = file tier1.channels.channel1.capacity = 10000 tier1.channels.channel1.transactionCapacity = 10000 tier1.channels.channel1.write-timeout = 60 tier1.channels.channel1.checkpointDir = /data tier1.channels.channel1.dataDirs = /data # sink details tier1.sinks.sink1.type = HDFS tier1.sinks.sink1.fileType = DataStream tier1.sinks.sink1.channel = channel1 tier1.sinks.sink1.hdfs.path = hdfs://localhost:8020/user/cloudera/flume/events tier1.sinks.sink1.hdfs.writeFormat = Text tier1.sinks.sink1.hdfs.filePrefix = %{basename} tier1.sinks.sink1.threadsPoolSize = 4 tier1.sinks.sink1.hdfs.idleTimeout = 60 tier1.sinks.sink1.hdfs.batchSize = 100000 Then, I don't know how to start Flume in terminal to send file into HDFS, can someone help me? And can someone look at the configuration file and edit it for me if there are errors?

Online	Offline
Last Visited	‎07-01-2022 04:11 AM

Member Since	‎12-14-2018 09:13 AM
Last Visited	‎07-01-2022 04:11 AM
Posts	9
Kudos received	1

Cloudera Community

Error when running spark application with zeppelin

How to ingest data from Netezza using Nifi

Re: Some questions with Flume

Re: Some questions with Flume

Some questions with Flume

Re: Send text file into HDFS using Flume in Cloude...

Re: Send text file into HDFS using Flume in Cloude...

Re: Send text file into HDFS using Flume in Cloude...

Send text file into HDFS using Flume in Cloudera