About Schandhok

Schandhok · ‎02-20-2018

@Ravikiran Dasari Have you tried looking at NiFi and its capabilities. NiFi provides a lot of processors which can help you automate your tasks and to create a flow for performing those tasks. You can create a flow to pickup data from a source and dump it in a different location. You can check the following example written by one of our NiFi experts @Matt Clarke on - How you can use NiFi to pull data from an FTP server. How-to: Retrieve files from a SFTP server using NiFi (GetSFTP vs. ListSFTP) Along with the processors mentioned in the article above, you can use PutHDFS processors explained in below docs to dump the data in HDFS. PutHDFS - NiFi docs Hope this helps

Schandhok · ‎07-13-2017

@Mehedi Hasan SASL_SSL is not available in the older kafka consumer api. The new kafka consumer api integration is in Flume v.1.7.0 as noted here: https://issues.apache.org/jira/browse/FLUME-2821. What is the version of Flume you are using?

Schandhok · ‎07-04-2017

@Simran Kaur I believe currently oozie does not support the functionality of tagging jobs with prioirities Check the following links for more details Oozie Workflow Functional Spec OOZIE-2892

Schandhok · ‎07-04-2017

@Simran Kaur Oozie does not handle workflow jobs priority. As soon as a workflow job is ready to do a transition, Oozie will trigger the transition. Workflow transitions and action triggering are assumed to be fast and lightweight operations. In the capacity scheduler you could set a high priority queue providing the maximum resources of the cluster with extension( max. Capacity) to 100%. And while running the oozie workflow, you can provide the queue name in Job.properties and workflow.xml so that the job is always submitted to a high priority queue. In Job.Properties File queueName=<queue-name> Use the queueName in workflow.xml <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration>

Schandhok · ‎07-04-2017

@Adda FuentesAwesome. Good to hear. You can mark your answer as "Accepted" so that if someone faces this issue in future, the can try and debug around the same lines.

Schandhok · ‎07-03-2017

@Adda Fuentes What is the kafka version on the other end? Please add a screenshot of your configuration of PutKafka, Jaas.conf file and the entry you have added in bootstrap.conf

Schandhok · ‎06-30-2017

@Bharadwaj Bhimavarapu There are no attachments to the question. Can you please check again? Do you see any error message on the processor or in the bulletin board, or in nifi-app.log?

Schandhok · ‎06-30-2017

PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully. The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config agent1.sinks = HdfsSink1 agent1.channels = channel1 agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel agent1.channels.channel1.brokerList=node11.openstacklocal:6667 agent1.channels.channel1.kafka.topic=test agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181 agent1.channels.channel1.capacity=10000 agent1.channels.channel1.transactionCapacity=1000 agent1.channels.channel1.parseAsFlumeEvent=false agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c agent1.sinks.HdfsSink1.channel=channel1 agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000 agent1.sinks.HdfsSink1.hdfs.batchSize=1000 agent1.sinks.HdfsSink1.hdfs.callTimeout=10000 agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro agent1.sinks.HdfsSink1.hdfs.fileType=DataStream agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50 ##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.rollCount=1000 agent1.sinks.HdfsSink1.hdfs.rollInterval=60 agent1.sinks.HdfsSink1.hdfs.rollSize=0 agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1 agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100 agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000 agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true agent1.sinks.HdfsSink1.hdfs.writeFormat=Text agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully. [..] 2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'} 2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match 2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'} 2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True} 2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'} 2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'} 2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py [..] def build_flume_topology(content): result = {} agent_names = [] for line in content.split('\n'): rline = line.strip() if 0 != len(rline) and not rline.startswith('#'): pair = rline.split('=') lhs = pair[0].strip() # workaround for properties that contain '=' rhs = "=".join(pair[1:]).strip() part0 = lhs.split('.')[0] if lhs.endswith(".sources"): agent_names.append(part0) if not result.has_key(part0): result[part0] = {} result[part0][lhs] = rhs # trim out non-agents for k in result.keys(): if not k in agent_names: del result[k] return result [..] SOLUTION Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents. Sample source definition: # Flume agent config agent1.sources = dummysource agent1.sinks = HdfsSink1 agent1.channels = channel1

Schandhok · ‎06-30-2017

PROBLEM DESCRIPTION: Flume agent configured without any sources fails to start in Ambari. However, a message in the service status log indicates that the flume agent has started successfully. The following sample configuration file works in flume node if ran manually with flume-ng command. However, the same configuration fails with Ambari. # Flume agent config agent1.sinks = HdfsSink1 agent1.channels = channel1 agent1.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel agent1.channels.channel1.brokerList=node11.openstacklocal:6667 agent1.channels.channel1.kafka.topic=test agent1.channels.channel1.zookeeperConnect=node11.openstacklocal:2181 agent1.channels.channel1.capacity=10000 agent1.channels.channel1.transactionCapacity=1000 agent1.channels.channel1.parseAsFlumeEvent=false agent1.channels.channel1.kafka.consumer.group.id=test.hdfs-c agent1.sinks.HdfsSink1.channel=channel1 agent1.sinks.HdfsSink1.hdfs.appendTimeout=10000 agent1.sinks.HdfsSink1.hdfs.batchSize=1000 agent1.sinks.HdfsSink1.hdfs.callTimeout=10000 agent1.sinks.HdfsSink1.hdfs.filePrefix=xrs-SegmentEventData agent1.sinks.HdfsSink1.hdfs.fileSuffix=.avro agent1.sinks.HdfsSink1.hdfs.fileType=DataStream agent1.sinks.HdfsSink1.hdfs.maxOpenFiles=50 ##agent1.sinks.HdfsSink1.hdfs.path=/data/%{topic}/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.path=/tmp/%y-%m-%d agent1.sinks.HdfsSink1.hdfs.rollCount=1000 agent1.sinks.HdfsSink1.hdfs.rollInterval=60 agent1.sinks.HdfsSink1.hdfs.rollSize=0 agent1.sinks.HdfsSink1.hdfs.rollTimerPoolSize=1 agent1.sinks.HdfsSink1.hdfs.threadsPoolSize=100 agent1.sinks.HdfsSink1.hdfs.txnEventMax=40000 agent1.sinks.HdfsSink1.hdfs.useLocalTimeStamp=true agent1.sinks.HdfsSink1.hdfs.writeFormat=Text agent1.sinks.HdfsSink1.type=hdfs Ambari logs for service startup shows that command ran successfully [..] 2016-11-09 09:25:11,411 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-11-09 09:25:11,715 - File['/var/run/flume/ambari-state.txt'] {'content': 'INSTALLED'} 2016-11-09 09:25:11,719 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match 2016-11-09 09:25:11,723 - Directory['/var/run/flume'] {'owner': 'flume', 'group': 'hadoop'} 2016-11-09 09:25:11,723 - Directory['/usr/hdp/current/flume-server/conf'] {'owner': 'flume', 'create_parents': True} 2016-11-09 09:25:11,724 - Directory['/var/log/flume'] {'owner': 'flume', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'} 2016-11-09 09:25:11,726 - File['/var/run/flume/ambari-state.txt'] {'content': 'STARTED'} 2016-11-09 09:25:11,726 - Writing File['/var/run/flume/ambari-state.txt'] because contents don't match Command completed successfully! ROOT CAUSE: This issue occurs because the flume.py script is designed to checks for the agent name by parsing the line containing "sources" definition in flume.conf. And it ignores the other parameters like channels and sinks. # vim /var/lib/ambari-server/resources/common-services/FLUME/1.4.0.2.0/package/scripts/flume.py [..] def build_flume_topology(content): result = {} agent_names = [] for line in content.split('\n'): rline = line.strip() if 0 != len(rline) and not rline.startswith('#'): pair = rline.split('=') lhs = pair[0].strip() # workaround for properties that contain '=' rhs = "=".join(pair[1:]).strip() part0 = lhs.split('.')[0] if lhs.endswith(".sources"): agent_names.append(part0) if not result.has_key(part0): result[part0] = {} result[part0][lhs] = rhs # trim out non-agents for k in result.keys(): if not k in agent_names: del result[k] return result [..] SOLUTION/WORKAROUND Add a dummy source definition in the starting of the flume.conf. This will ensure that Ambari detects the flume agent and adds the same in the array of flume agents. Sample source definition: # Flume agent config agent1.sources = dummysource agent1.sinks = HdfsSink1 agent1.channels = channel1

Schandhok · ‎06-30-2017

@rpulluru This issue occurs when one of the following is true: If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing. If a file name is reused at a later time, Flume will print an error to its log file and stop processing. If you are copying the files in your /data/src/input directory, change the operation to ‘mv’, Or you can copy the files as .tmp and then 'mv' the '.tmp' file to the same spooling directory with the actual name. Add the following line in flume.conf to ignore .tmp files in SpoolDir: Agent1.sources.spooldir-source.ignorePattern=^.*\.tmp$

Online	Offline
Last Visited	‎01-24-2020 09:46 PM

Member Since	‎10-20-2016 02:18 PM
Last Visited	‎01-24-2020 09:46 PM
Posts	28
Kudos received	9

Cloudera Community

Re: flume kafka sink for SASL_SSL protocol is not ...

Re: Flume error while testing spooldir source

Re: NiFi failing startup - nested exception is or...

Re: Storm word count Topology fails

Re: Nifi 1.2.0 on Windows 2012 Server is not rolli...

Re: what is the best way to get ftp file to hdfs c...

Re: flume kafka sink for SASL_SSL protocol is not ...

Re: assign VERY_HIGH priority to an oozie workflow

Re: assign VERY_HIGH priority to an oozie workflow

Re: Issues with sending messages to kerberos enabl...

Re: Issues with sending messages to kerberos enabl...

Re: Generateflow file not genarating flow files

Flume agent configured without any sources fails t...

Flume agent configured without any sources fails t...

Re: Flume error while testing spooldir source