Member since
10-20-2016
28
Posts
9
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2279 | 07-13-2017 12:47 PM | |
3095 | 06-30-2017 01:37 PM | |
3433 | 06-30-2017 05:18 AM | |
1409 | 06-29-2017 03:15 PM | |
2697 | 06-23-2017 01:51 PM |
02-20-2018
11:50 AM
1 Kudo
@Ravikiran Dasari Have you tried looking at NiFi and its capabilities. NiFi provides a lot of processors which can help you automate your tasks and to create a flow for performing those tasks. You can create a flow to pickup data from a source and dump it in a different location. You can check the following example written by one of our NiFi experts @Matt Clarke on - How you can use NiFi to pull data from an FTP server. How-to: Retrieve files from a SFTP server using NiFi (GetSFTP vs. ListSFTP) Along with the processors mentioned in the article above, you can use PutHDFS processors explained in below docs to dump the data in HDFS. PutHDFS - NiFi docs Hope this helps
... View more
07-13-2017
12:47 PM
@Mehedi Hasan SASL_SSL is not available in the older kafka consumer api. The new kafka consumer api integration is in Flume v.1.7.0 as noted here: https://issues.apache.org/jira/browse/FLUME-2821. What is the version of Flume you are using?
... View more
07-04-2017
07:12 PM
@Simran Kaur I believe currently oozie does not support the functionality of tagging jobs with prioirities Check the following links for more details Oozie Workflow Functional Spec OOZIE-2892
... View more
07-04-2017
06:28 AM
@Simran Kaur Oozie does not handle workflow jobs priority. As soon as a workflow job is ready to do a transition, Oozie will trigger the transition. Workflow transitions and action triggering are assumed to be fast and lightweight operations.
In the capacity scheduler you could set a high priority queue providing the maximum resources of the cluster with extension( max. Capacity) to 100%. And while running the oozie workflow, you can provide the queue name in Job.properties and workflow.xml so that the job is always submitted to a high priority queue. In Job.Properties File queueName=<queue-name> Use the queueName in workflow.xml <configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
... View more
07-04-2017
05:16 AM
@Adda FuentesAwesome. Good to hear. You can mark your answer as "Accepted" so that if someone faces this issue in future, the can try and debug around the same lines.
... View more
07-03-2017
05:14 AM
@Adda Fuentes What is the kafka version on the other end? Please add a screenshot of your configuration of PutKafka, Jaas.conf file and the entry you have added in bootstrap.conf
... View more
06-30-2017
03:09 PM
@Bharadwaj Bhimavarapu There are no attachments to the question. Can you please check again? Do you see any error message on the processor or in the bulletin board, or in nifi-app.log?
... View more
06-30-2017
03:00 PM
2 Kudos
PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully.
The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config
agent1.sinks = HdfsSink1
agent1.channels = channel1
agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.channel1.brokerList=node11.openstacklocal:6667
agent1.channels.channel1.kafka.topic=test
agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181
agent1.channels.channel1.capacity=10000
agent1.channels.channel1.transactionCapacity=1000
agent1.channels.channel1.parseAsFlumeEvent=false
agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c
agent1.sinks.HdfsSink1.channel=channel1
agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000
agent1.sinks.HdfsSink1.hdfs.batchSize=1000
agent1.sinks.HdfsSink1.hdfs.callTimeout=10000
agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData
agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro
agent1.sinks.HdfsSink1.hdfs.fileType=DataStream
agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50
##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.rollCount=1000
agent1.sinks.HdfsSink1.hdfs.rollInterval=60
agent1.sinks.HdfsSink1.hdfs.rollSize=0
agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1
agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100
agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000
agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true
agent1.sinks.HdfsSink1.hdfs.writeFormat=Text
agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully. [..]
2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'}
2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'}
2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True}
2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True,
'mode': 0755, 'cd_access': 'a'}
2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'}
2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py
[..]
def build_flume_topology(content):
result = {}
agent_names = []
for line in content.split('\n'):
rline = line.strip()
if 0 != len(rline) and not rline.startswith('#'):
pair = rline.split('=')
lhs = pair[0].strip()
# workaround for properties that contain '='
rhs = "=".join(pair[1:]).strip()
part0 = lhs.split('.')[0]
if lhs.endswith(".sources"):
agent_names.append(part0)
if not result.has_key(part0):
result[part0] = {}
result[part0][lhs] = rhs
# trim out non-agents
for k in result.keys():
if not k in agent_names:
del result[k]
return result
[..] SOLUTION Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents.
Sample source definition: # Flume agent config
agent1.sources = dummysource
agent1.sinks = HdfsSink1
agent1.channels = channel1
... View more
Labels:
06-30-2017
02:57 PM
PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully.
The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config
agent1.sinks = HdfsSink1
agent1.channels = channel1
agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.channel1.brokerList=node11.openstacklocal:6667
agent1.channels.channel1.kafka.topic=test
agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181
agent1.channels.channel1.capacity=10000
agent1.channels.channel1.transactionCapacity=1000
agent1.channels.channel1.parseAsFlumeEvent=false
agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c
agent1.sinks.HdfsSink1.channel=channel1
agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000
agent1.sinks.HdfsSink1.hdfs.batchSize=1000
agent1.sinks.HdfsSink1.hdfs.callTimeout=10000
agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData
agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro
agent1.sinks.HdfsSink1.hdfs.fileType=DataStream
agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50
##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d
agent1.sinks.HdfsSink1.hdfs.rollCount=1000
agent1.sinks.HdfsSink1.hdfs.rollInterval=60
agent1.sinks.HdfsSink1.hdfs.rollSize=0
agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1
agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100
agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000
agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true
agent1.sinks.HdfsSink1.hdfs.writeFormat=Text
agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully [..]
2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'}
2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'}
2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True}
2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True,
'mode': 0755, 'cd_access': 'a'}
2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'}
2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match
Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py
[..]
def build_flume_topology(content):
result = {}
agent_names = []
for line in content.split('\n'):
rline = line.strip()
if 0 != len(rline) and not rline.startswith('#'):
pair = rline.split('=')
lhs = pair[0].strip()
# workaround for properties that contain '='
rhs = "=".join(pair[1:]).strip()
part0 = lhs.split('.')[0]
if lhs.endswith(".sources"):
agent_names.append(part0)
if not result.has_key(part0):
result[part0] = {}
result[part0][lhs] = rhs
# trim out non-agents
for k in result.keys():
if not k in agent_names:
del result[k]
return result
[..] SOLUTION/WORKAROUND Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents.
Sample source definition: # Flume agent config
agent1.sources = dummysource
agent1.sinks = HdfsSink1
agent1.channels = channel1
... View more
Labels:
06-30-2017
01:37 PM
1 Kudo
@rpulluru This issue occurs when one of the following is true: If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing. If a file name is reused at a later time, Flume will print an error to its log file and stop processing. If you are copying the files in your /data/src/input directory, change the operation to ‘mv’, Or you can copy the files as .tmp and then 'mv' the '.tmp' file to the same spooling directory with the actual name. Add the following line in flume.conf to ignore .tmp files in SpoolDir: Agent1.sources.spooldir-source.ignorePattern=^.*\.tmp$
... View more