Member since
10-20-2016
28
Posts
9
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2279 | 07-13-2017 12:47 PM | |
3095 | 06-30-2017 01:37 PM | |
3432 | 06-30-2017 05:18 AM | |
1409 | 06-29-2017 03:15 PM | |
2697 | 06-23-2017 01:51 PM |
11-17-2022
09:20 AM
I did all the suggestions and tried to follow all the steps, but when I am running command: ./kafka-console-producer.sh --broker-list host.kafka:6667 --topic cleanCsv --producer-property security.protocol=SASL_PLAINTEXT < /tmp/clean_csv_full.csv Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[2022-11-17 17:06:35,693] WARN [Principal=null]: TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.KerberosLogin) No issues with creation of topics or other, just when trying to push csv inside I am getting that error, when without Kerberos all goes good and smoothly uploads it. Any help is really very helpful, thank you in advance and looking forward to your reply
... View more
01-14-2021
11:54 PM
Hi ravikirandasar1, I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?
... View more
07-04-2017
05:16 AM
@Adda FuentesAwesome. Good to hear. You can mark your answer as "Accepted" so that if someone faces this issue in future, the can try and debug around the same lines.
... View more
06-30-2017
03:00 PM
2 Kudos
PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully.
The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config
agent1.sinks = HdfsSink1
agent1.channels = channel1
agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.channel1.brokerList=node11.openstacklocal:6667
agent1.channels.channel1.kafka.topic=test
agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181
agent1.channels.channel1.capacity=10000
agent1.channels.channel1.transactionCapacity=1000
agent1.channels.channel1.parseAsFlumeEvent=false
agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c
agent1.sinks.HdfsSink1.channel=channel1
agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000
agent1.sinks.HdfsSink1.hdfs.batchSize=1000
agent1.sinks.HdfsSink1.hdfs.callTimeout=10000
agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData
agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro
agent1.sinks.HdfsSink1.hdfs.fileType=DataStream
agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50
##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.rollCount=1000
agent1.sinks.HdfsSink1.hdfs.rollInterval=60
agent1.sinks.HdfsSink1.hdfs.rollSize=0
agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1
agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100
agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000
agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true
agent1.sinks.HdfsSink1.hdfs.writeFormat=Text
agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully. [..]
2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'}
2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'}
2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True}
2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True,
'mode': 0755, 'cd_access': 'a'}
2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'}
2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py
[..]
def build_flume_topology(content):
result = {}
agent_names = []
for line in content.split('\n'):
rline = line.strip()
if 0 != len(rline) and not rline.startswith('#'):
pair = rline.split('=')
lhs = pair[0].strip()
# workaround for properties that contain '='
rhs = "=".join(pair[1:]).strip()
part0 = lhs.split('.')[0]
if lhs.endswith(".sources"):
agent_names.append(part0)
if not result.has_key(part0):
result[part0] = {}
result[part0][lhs] = rhs
# trim out non-agents
for k in result.keys():
if not k in agent_names:
del result[k]
return result
[..] SOLUTION Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents.
Sample source definition: # Flume agent config
agent1.sources = dummysource
agent1.sinks = HdfsSink1
agent1.channels = channel1
... View more
Labels:
07-05-2017
06:48 PM
@Bharadwaj Bhimavarapu General guidance here is these values should be set to 2 times the number of available cores and no more then 4 times the number of available cores on a single instance of NiFi. If you are running a NiFi cluster, these values are enforced per node. So a setting of 16 in a 4 node cluster equates to a total of 64 threads cross the cluster. Setting values to high just results in many more threads in cpu wait and will not help performance at all. Beyond increasing these value you need to be mindful of how many concurrent task you assign each of your processors. Some processor are more cpu intensive then others (meaning they take longer to complete a job holding the thread much longer). You can look at the "tasks/time =: stats on a processor to see if it thread are long or short running. For processors that have long running threads you want to be extra careful on how many concurrent tasks you assign them. Thanks, Matt
... View more
06-30-2017
02:57 PM
PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully.
The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config
agent1.sinks = HdfsSink1
agent1.channels = channel1
agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.channel1.brokerList=node11.openstacklocal:6667
agent1.channels.channel1.kafka.topic=test
agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181
agent1.channels.channel1.capacity=10000
agent1.channels.channel1.transactionCapacity=1000
agent1.channels.channel1.parseAsFlumeEvent=false
agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c
agent1.sinks.HdfsSink1.channel=channel1
agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000
agent1.sinks.HdfsSink1.hdfs.batchSize=1000
agent1.sinks.HdfsSink1.hdfs.callTimeout=10000
agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData
agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro
agent1.sinks.HdfsSink1.hdfs.fileType=DataStream
agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50
##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.rollCount=1000
agent1.sinks.HdfsSink1.hdfs.rollInterval=60
agent1.sinks.HdfsSink1.hdfs.rollSize=0
agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1
agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100
agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000
agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true
agent1.sinks.HdfsSink1.hdfs.writeFormat=Text
agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully [..]
2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'}
2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'}
2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True}
2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True,
'mode': 0755, 'cd_access': 'a'}
2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'}
2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py
[..]
def build_flume_topology(content):
result = {}
agent_names = []
for line in content.split('\n'):
rline = line.strip()
if 0 != len(rline) and not rline.startswith('#'):
pair = rline.split('=')
lhs = pair[0].strip()
# workaround for properties that contain '='
rhs = "=".join(pair[1:]).strip()
part0 = lhs.split('.')[0]
if lhs.endswith(".sources"):
agent_names.append(part0)
if not result.has_key(part0):
result[part0] = {}
result[part0][lhs] = rhs
# trim out non-agents
for k in result.keys():
if not k in agent_names:
del result[k]
return result
[..] SOLUTION/WORKAROUND Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents.
Sample source definition: # Flume agent config
agent1.sources = dummysource
agent1.sinks = HdfsSink1
agent1.channels = channel1
... View more
Labels:
06-29-2017
07:12 PM
@Sandeep Nemuri Awesome. That worked. Thanks.
... View more