About Schandhok

gmegrelishvili · ‎11-17-2022

I did all the suggestions and tried to follow all the steps, but when I am running command: ./kafka-console-producer.sh --broker-list host.kafka:6667 --topic cleanCsv --producer-property security.protocol=SASL_PLAINTEXT < /tmp/clean_csv_full.csv Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[2022-11-17 17:06:35,693] WARN [Principal=null]: TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.KerberosLogin) No issues with creation of topics or other, just when trying to push csv inside I am getting that error, when without Kerberos all goes good and smoothly uploads it. Any help is really very helpful, thank you in advance and looking forward to your reply

Amoli · ‎01-14-2021

Hi ravikirandasar1, I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?

Schandhok · ‎07-04-2017

@Adda FuentesAwesome. Good to hear. You can mark your answer as "Accepted" so that if someone faces this issue in future, the can try and debug around the same lines.

Schandhok · ‎06-30-2017

PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully. The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config agent1.sinks = HdfsSink1 agent1.channels = channel1 agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel agent1.channels.channel1.brokerList=node11.openstacklocal:6667 agent1.channels.channel1.kafka.topic=test agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181 agent1.channels.channel1.capacity=10000 agent1.channels.channel1.transactionCapacity=1000 agent1.channels.channel1.parseAsFlumeEvent=false agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c agent1.sinks.HdfsSink1.channel=channel1 agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000 agent1.sinks.HdfsSink1.hdfs.batchSize=1000 agent1.sinks.HdfsSink1.hdfs.callTimeout=10000 agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro agent1.sinks.HdfsSink1.hdfs.fileType=DataStream agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50 ##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.rollCount=1000 agent1.sinks.HdfsSink1.hdfs.rollInterval=60 agent1.sinks.HdfsSink1.hdfs.rollSize=0 agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1 agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100 agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000 agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true agent1.sinks.HdfsSink1.hdfs.writeFormat=Text agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully. [..] 2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'} 2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match 2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'} 2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True} 2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'} 2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'} 2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py [..] def build_flume_topology(content): result = {} agent_names = [] for line in content.split('\n'): rline = line.strip() if 0 != len(rline) and not rline.startswith('#'): pair = rline.split('=') lhs = pair[0].strip() # workaround for properties that contain '=' rhs = "=".join(pair[1:]).strip() part0 = lhs.split('.')[0] if lhs.endswith(".sources"): agent_names.append(part0) if not result.has_key(part0): result[part0] = {} result[part0][lhs] = rhs # trim out non-agents for k in result.keys(): if not k in agent_names: del result[k] return result [..] SOLUTION Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents. Sample source definition: # Flume agent config agent1.sources = dummysource agent1.sinks = HdfsSink1 agent1.channels = channel1

MattWho · ‎07-05-2017

@Bharadwaj Bhimavarapu General guidance here is these values should be set to 2 times the number of available cores and no more then 4 times the number of available cores on a single instance of NiFi. If you are running a NiFi cluster, these values are enforced per node. So a setting of 16 in a 4 node cluster equates to a total of 64 threads cross the cluster. Setting values to high just results in many more threads in cpu wait and will not help performance at all. Beyond increasing these value you need to be mindful of how many concurrent task you assign each of your processors. Some processor are more cpu intensive then others (meaning they take longer to complete a job holding the thread much longer). You can look at the "tasks/time =: stats on a processor to see if it thread are long or short running. For processors that have long running threads you want to be extra careful on how many concurrent tasks you assign them. Thanks, Matt

Schandhok · ‎06-30-2017

PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully. The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config agent1.sinks = HdfsSink1 agent1.channels = channel1 agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel agent1.channels.channel1.brokerList=node11.openstacklocal:6667 agent1.channels.channel1.kafka.topic=test agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181 agent1.channels.channel1.capacity=10000 agent1.channels.channel1.transactionCapacity=1000 agent1.channels.channel1.parseAsFlumeEvent=false agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c agent1.sinks.HdfsSink1.channel=channel1 agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000 agent1.sinks.HdfsSink1.hdfs.batchSize=1000 agent1.sinks.HdfsSink1.hdfs.callTimeout=10000 agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro agent1.sinks.HdfsSink1.hdfs.fileType=DataStream agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50 ##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.rollCount=1000 agent1.sinks.HdfsSink1.hdfs.rollInterval=60 agent1.sinks.HdfsSink1.hdfs.rollSize=0 agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1 agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100 agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000 agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true agent1.sinks.HdfsSink1.hdfs.writeFormat=Text agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully [..] 2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'} 2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match 2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'} 2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True} 2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'} 2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'} 2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py [..] def build_flume_topology(content): result = {} agent_names = [] for line in content.split('\n'): rline = line.strip() if 0 != len(rline) and not rline.startswith('#'): pair = rline.split('=') lhs = pair[0].strip() # workaround for properties that contain '=' rhs = "=".join(pair[1:]).strip() part0 = lhs.split('.')[0] if lhs.endswith(".sources"): agent_names.append(part0) if not result.has_key(part0): result[part0] = {} result[part0][lhs] = rhs # trim out non-agents for k in result.keys(): if not k in agent_names: del result[k] return result [..] SOLUTION/WORKAROUND Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents. Sample source definition: # Flume agent config agent1.sources = dummysource agent1.sinks = HdfsSink1 agent1.channels = channel1

Schandhok · ‎06-29-2017

@Sandeep Nemuri Awesome. That worked. Thanks.

Anishkumarv · ‎06-08-2017

Thanks for the useful link 🙂 it worked.

Online	Offline
Last Visited	‎01-24-2020 09:46 PM

Member Since	‎10-20-2016 02:18 PM
Last Visited	‎01-24-2020 09:46 PM
Posts	28
Kudos received	9

Cloudera Community

Re: flume kafka sink for SASL_SSL protocol is not ...

Re: Flume error while testing spooldir source

Re: NiFi failing startup - nested exception is or...

Re: Storm word count Topology fails

Re: Nifi 1.2.0 on Windows 2012 Server is not rolli...

Re: Unable to produce or consum in kafka

Re: what is the best way to get ftp file to hdfs c...

Re: Issues with sending messages to kerberos enabl...

Flume agent configured without any sources fails t...

Re: Generateflow file not genarating flow files

Flume agent configured without any sources fails t...

Re: Unable to add JAAS configuration for client er...

Re: Ranger user sync issue for Nifi