Member since
04-20-2016
27
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3941 | 09-30-2016 03:06 PM |
09-30-2016
09:42 PM
@Bryan Bende I appreciate the help immensely will get back to you on Monday with feedback. Cheers to a good weekend!
... View more
09-30-2016
03:06 PM
@Bryan Bende your take on my comments above is much appreciated. Thanks
... View more
09-29-2016
09:13 PM
@Bryan Bende let's say I want to go the route where I use different ExtractText to handle different delimiters how I do about that?Am quite confused here(a vivid example will ve helpful). From my understanding, ExtractText processor will parse a file regardless of the delimiters of the file but what actually matters is the regular expression used to extract the data?correct me if wrong. I also tried replicating your example above, the ingestion of the flow file was successful but there was no data in data the databases tables.
... View more
09-29-2016
03:27 PM
@Bryan Bende your input is much appreciated in regards to this question. Thanks
... View more
09-29-2016
12:34 AM
@Bryan Bende Thanks once again for getting back.I prefer answering your questions in this pattern: (1) Since your flow was working you must have already configured ExtractText with a pattern to parse the line right? Yes I used regular expression(attached config image below)to parse the line but I do not think that is the best way to handle this.I would rather like to parse the lines using the delimiter. (2)So are you just asking how to handle more delimiters? Yes that will helpful.
... View more
09-28-2016
03:46 PM
Hi @Bryan Bende thanks for getting back at me.To answer your question: What do you want to do with your text files? I want to be able to move the flat files from sftp containing delimiters into a sql database using nifi. How nifi handles the delimiters in the flat file are of concern to me? Which one of the processors stated above is handling the delimiters in the text file? I posted a sample data flow of what I want to achieve and it been answered by you here: https://community.hortonworks.com/questions/57779/how-to-preventing-duplicates-when-ingesting-into-m.html#comment-57785 Do you want to convert it to another format, if so what format? No I do not want to convert it into another format.
... View more
09-27-2016
10:28 PM
I have text files that sometimes have various delimiters such as quotation marks, commas, and tabs etc what processor can I use to handle such delimiters and how to I configure it to handle such delimiters in my file? ConvertCsvToAvro "properties" section has similar properties I want to achieve.Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
09-21-2016
09:25 PM
@Bryan Bende thanks a lot for the help works perfect. I attached a snippet of the updated workflow should anyone experience such an issue in the future.Thanks again.
... View more
09-21-2016
09:01 PM
1 Kudo
I have a dataflow that ingest file from sftp into mysql and would like to know how to prevent an enormous amount of duplicates being ingested by nifi into mysql. Attached details below. Thanks (1)Data flow (2) count after nifi ingest into mysql (3)Original Data on SFTP
... View more
Labels:
- Labels:
-
Apache NiFi
09-21-2016
01:35 AM
@Pierre Villard sorry for the late reply. How do I verify am not ingesting multiple times the same file with List/FetchSFTP?
... View more
09-07-2016
05:35 PM
1 Kudo
Hi All, @Pierre Villard ,@mclark, @Bryan Bende, @Brandon Wilson, @jfrazee, @Andrew Grande , @Matt Burgess could you please assist me with the above.Greatly appreciate the help please.I have explained everything thoroughly.
... View more
09-02-2016
09:21 PM
Hi All, @Pierre Villard ,@mclark, @Bryan Bende, @Brandon Wilson, @jfrazee, @Andrew Grande , @Matt Burgess I have been able to insert into MYSQL by setting "Obtain Generated Keys =true" in PutSQL configuration (pics below) but the problem now is, there is an insane number of duplicates that got ingested into the mysql table(pics below). I would like to know what might be going on with my flow to cause this and how to fix it.Thanks a lot!! (i)PutSql Configuration (ii)ExtractText (iii)Original data to be ingested into MYSQL (iv)Table count after PutSQL nifi ingest
... View more
09-02-2016
05:27 PM
Hi All, @mclark, @Bryan Bende, @Brandon Wilson, @jfrazee, @Pierre Villard, @Andrew Grande I would appreciate if you could get assist me on the above.Please refer to the above for details and explanation.Thanks a lot!
... View more
09-02-2016
04:41 PM
Hi @Matt Burgess could there other reasons such as available RAM on this nifi compute? Am currently using a t2 medium in AWS .
... View more
09-01-2016
09:12 PM
Hi @Matt Burgess thanks a lot for responding I updated my workflow with the specifics you suggested but I still cannot insert into mysql db, am getting the same error as mentioned previously.I also attached my configs as well. PutSQL ERROR: failed to update a database due to a failed batch update.There were a total of 1 flow file that failed, 0 that succeeded and 0 that were not excuted and will be rerouted to retry. (i)Updated Workflw (ii)SplitText Config (iii)ExtractText Config (iv) Both ReplaceText and PutSQL remain unchanged. ps: What is the tiny number "1" that appears when processors are running. Cheers
... View more
09-01-2016
05:29 PM
1 Kudo
Hi community, I would like to leverage nifi to ingest a file from sftp and insert its data into mysql database.I currently have been unsuccessful in doing so and will require all assistance in the right direction.Much appreciated for the help in advance and below are the specifics (1)I should be able to list the files on sftp (2) select a particular file or files (3)Fetch the "desired file" (4) Ingest "desired file" into database. Below is more detail. (i)Sample content of sftp file to ingest into via nifi via sftp processors later to insert into MYSQL (ii)Current designed nifi workflow (iii)Error am getting with this workflow ConvertJSONToSQL ERROR: failed to parse standardflowfilerecord due to processor exception as JSON unexpected character (''' (code39)) expected a valid value (number,string,array,object,'true','false' or 'null') PutSQL ERROR: failed to update a database due to a failed batch update.There were a total of 1 flow file that failed, 0 that succeeded and 0 that were not excuted and will be rerouted to retry. (iv)Processor Configs 1. ReplaceTxext sql statement: INSERT INTO NiFiUsecase001 (Column1, Column2, Column3, Column4, Column5)
VALUES ('${Column1}', '${Column2}','${Column3}','${Column4}','${Column5}') 2. ConvertJSONToSQL 3.PutSQL
... View more
Labels:
- Labels:
-
Apache NiFi
04-29-2016
03:22 PM
@Timothy Spann should local master be set to "yarn-client" like it was set in "spark-yarn-interpreter"? The cluster is running spark 1.6 and works perfectly from the command line and yes the green connected light is on on the upper right corner.
... View more
04-27-2016
07:49 PM
@Timothy Spann Yes spark is running in the cluster and on its default port(never changed the default port).I attached the configuration screen for spark interpreter.I can also access spark from the commandline and also from the UI those work perfectly.Thanks!
... View more
04-27-2016
04:34 PM
@Timothy Spann nothing is conflicting with firewall settings and there are no services running on port 9995 except zeppelin.Since server in use I'll restart at down-time
... View more
04-27-2016
03:20 PM
@Yogeshprabhu I attached spark-yarn-client interpreter settings. Thanks
... View more
04-26-2016
11:12 PM
@vshukla restarted the zeppelin interpreter and zappelin-daemon.sh still geting the same error. Thanks
... View more
04-26-2016
09:40 PM
@Yogeshprabhu thanks very much for reaching back to me, I redid the steps listed in the link above and still experiencing the same errors.I attached photos of my configs. Thanks alot
... View more
04-26-2016
08:20 PM
2 Kudos
I installed zeppelin manually on my node(not sandbox) but after following through the instructions on configuring the spark notebook I notice that when I run "sc.version" it throws me an error(below): sc.version java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:142)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:271)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:199)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:326)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Zeppelin
04-21-2016
02:55 PM
@Arvind Kandaswamy Thanks alot it works now.
... View more
04-20-2016
05:06 AM
I need help with zeppelin.Am getting resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install python-pip' returned 1. Error: Nothing to do (below is my log from ambari).Zeppelin is installed on a cluster not sandbox. Thanks stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.4/services/ZEPPELIN/package/scripts/master.py", line 235, in <module>
Master().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.4/services/ZEPPELIN/package/scripts/master.py", line 54, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 404, in install_packages
Package(name)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 49, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package
shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install python-pip' returned 1. Error: Nothing to do
stdout:
2016-04-19 23:09:14,624 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.0.0-169
2016-04-19 23:09:14,624 - Checking if need to create versioned conf dir /etc/hadoop/2.4.0.0-169/0
2016-04-19 23:09:14,624 - call['conf-select create-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-04-19 23:09:14,647 - call returned (1, '/etc/hadoop/2.4.0.0-169/0 exist already', '')
2016-04-19 23:09:14,647 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-04-19 23:09:14,669 - checked_call returned (0, '/usr/hdp/2.4.0.0-169/hadoop/conf -> /etc/hadoop/2.4.0.0-169/0')
2016-04-19 23:09:14,669 - Ensuring that hadoop has the correct symlink structure
2016-04-19 23:09:14,670 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-04-19 23:09:14,671 - Group['hadoop'] {}
2016-04-19 23:09:14,672 - Group['users'] {}
2016-04-19 23:09:14,673 - Group['zeppelin'] {}
2016-04-19 23:09:14,673 - Group['knox'] {}
2016-04-19 23:09:14,673 - Group['spark'] {}
2016-04-19 23:09:14,673 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-04-19 23:09:14,674 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,675 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,675 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-04-19 23:09:14,676 - User['flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,677 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,677 - User['knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,678 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,679 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,679 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,680 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-04-19 23:09:14,681 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,681 - User['mahout'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,682 - User['falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2016-04-19 23:09:14,683 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,683 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,684 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,685 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,685 - User['atlas'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2016-04-19 23:09:14,686 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-04-19 23:09:14,690 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-04-19 23:09:14,694 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-04-19 23:09:14,695 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-04-19 23:09:14,695 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-04-19 23:09:14,696 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-04-19 23:09:14,701 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-04-19 23:09:14,701 - Group['hdfs'] {}
2016-04-19 23:09:14,701 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': [u'hadoop', u'hdfs']}
2016-04-19 23:09:14,702 - Directory['/etc/hadoop'] {'mode': 0755}
2016-04-19 23:09:14,714 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-04-19 23:09:14,714 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0777}
2016-04-19 23:09:14,728 - Repository['HDP-2.4'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.0.0', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None}
2016-04-19 23:09:14,735 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.4]\nname=HDP-2.4\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.0.0\n\npath=/\nenabled=1\ngpgcheck=0'}
2016-04-19 23:09:14,735 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None}
2016-04-19 23:09:14,740 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.20]\nname=HDP-UTILS-1.1.0.20\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7\n\npath=/\nenabled=1\ngpgcheck=0'}
2016-04-19 23:09:14,740 - Package['unzip'] {}
2016-04-19 23:09:14,867 - Skipping installation of existing package unzip
2016-04-19 23:09:14,867 - Package['curl'] {}
2016-04-19 23:09:14,906 - Skipping installation of existing package curl
2016-04-19 23:09:14,906 - Package['hdp-select'] {}
2016-04-19 23:09:14,945 - Skipping installation of existing package hdp-select
2016-04-19 23:09:15,189 - Execute['find /var/lib/ambari-agent/cache/stacks/HDP/2.4/services/ZEPPELIN/package -iname "*.sh" | xargs chmod +x'] {}
2016-04-19 23:09:15,197 - Execute['echo platform.linux_distribution:Red Hat Enterprise Linux Server+7.2+Maipo'] {}
2016-04-19 23:09:15,201 - Package['gcc-gfortran'] {}
2016-04-19 23:09:15,332 - Skipping installation of existing package gcc-gfortran
2016-04-19 23:09:15,333 - Package['blas-devel'] {}
2016-04-19 23:09:15,372 - Skipping installation of existing package blas-devel
2016-04-19 23:09:15,373 - Package['lapack-devel'] {}
2016-04-19 23:09:15,412 - Skipping installation of existing package lapack-devel
2016-04-19 23:09:15,412 - Package['python-devel'] {}
2016-04-19 23:09:15,452 - Skipping installation of existing package python-devel
2016-04-19 23:09:15,453 - Package['python-pip'] {}
2016-04-19 23:09:15,492 - Installing package python-pip ('/usr/bin/yum -d 0 -e 0 -y install python-pip')
... View more