Member since
07-09-2018
19
Posts
4
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3158 | 08-21-2018 01:19 PM | |
501 | 08-20-2018 01:39 AM | |
718 | 07-12-2018 08:26 PM | |
524 | 07-09-2018 07:07 PM |
08-24-2018
03:58 PM
Thanks Shu, that seems to work! is that expression language, or is that basically doing regular sql functions on the db? Thanks!
... View more
08-23-2018
08:46 PM
1 Kudo
Hi, I'm getting some data from a pg database, and it has 2 "Text" types, which I think are my problem. When I go to insert those to a redshift table, i'm getting a jdbc error, "given type does not match given object". I think i need to use nifi expression language to convert the text column toString, so it will work with varchar, but i'm not sure. any help appreciated. Thanks, Ron
... View more
- Tags:
- expression-language
Labels:
- Labels:
-
Apache NiFi
08-21-2018
04:13 PM
I actually think the pull is not my problem. its inserts to redshift, which are slow. think i need to put this data in files, and do copy commands to redshift. guess i need to move to saving this data in files, and then copy/load. Or i guess i can manually load tables, and then set an initial max value column, to have the nifi job start from there, and continue updating. These large tables are challenging.
... View more
08-21-2018
02:15 PM
thanks, was just reading something about GTF. much appreciated,
... View more
08-21-2018
01:19 PM
hi @Matt Burgess, i was just reading querydatabasetable's spec, and didn't realize it says its to be run on the primary node only, is that correct? thanks https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html
... View more
08-20-2018
08:46 PM
My jvm heap usage is really high. 85%, and I have fairly large servers. How can I clear these queues and cleanup space? thanks
... View more
08-20-2018
06:25 PM
Hi, nevermind on that, i just created a new process group and tried again. its running now, but still seems slow. the putdbrecord has 3 pending actions, but no other stats yet (eg, in/out). 1.79gb pending in the queue. I set my output batch size of querydbtable to 5,000. i also set fetch size limit and max rows per flowfile to 5k too. does that make sense? thanks
... View more
08-20-2018
05:40 PM
Hey @Matt Burgess, i made a new process group with this, but forgot to use the output batch size property. so i stopped it, changed, and am trying to clear queue and restart, but its stuck in the queue. I've had this happen a few times. any suggestions here? i can't stop, play, terminate or anything. When i do hit empty, it says zero out of 3 flowfiles were removed, like it can't remove them. The processor after the queue has no option to start it. Thanks screen-shot-2018-08-20-at-14100-pm.png
... View more
08-20-2018
03:44 PM
Thanks Matt! will give it a shot
... View more
08-20-2018
02:59 PM
Hi, I have a basic etl flow (screenshot below) where i'm trying to start loading a full table from db to another, and then incrementally load any changes to the table based on an updated date. It runs, but part of it is slow. I think the converting json to sql is a bottle neck, as the table has 2M+ records, and they're queued up, but it can only create insert statements so fast, and then inserts to redshift (the target) are not the fastest.
If you look closely at screenshot below right before the convertjsontosql, can see the queue of 3gb (which is about the size of the table), waiting to be converted I believe. Read some of the articles, and set the translate field names to false. also set ignore unmatched columns and fields.
So the initial loading of data is a bit slow. Any suggestions, workarounds, ideas are appeciated to have a process that i know i can start from scratch, load a table fully and then keep loading? or maybe I do have to do one initial large load somehow, and then start the incremental process? I know copy commands in redshift are way better, perhaps i have to do that first, and then start incremental? screen-shot-2018-08-20-at-104837-am.png Thanks,
Ron
... View more
Labels:
- Labels:
-
Apache NiFi
08-20-2018
01:39 AM
2 Kudos
Think I found that is was just the initial memory allocation only being at the default 512mb. upped it to 25gb and seems to be running fine.
... View more
08-17-2018
08:03 PM
Hi, Getting started with Nifi, and i have a table thats about 2 million records. Here is my data flow screen-shot-2018-08-17-at-35302-pm.png Have a querydatabaserecord, which uses the tables updated_at date for syncing. I limited the fetch size to 100, and the schedule of the job every minute, thinking that it would take its time and sync over a few days. I ran this with another smaller table and its fine, and it syncs data over to my redshift db. I've tried twice now, and it seems like when i start this, the nifi interface freezes, and then if i refresh browser i can't get to the nifi interface. and then it takes an hour of restarting all the services and nifi services over and over again. Once it comes back up, had an error about jvm stack memory overflow. The table is about 3GB. My cluster is a 3 node cluster. So i have ambari on its own server, 4 nifi nodes, one which is the manager, and 3 which are the "workers", i guess. Each server has 30gb ram, and are aws ec2 r5.xlarge. Any thougths or ideas? I think its using all of the memory, but if someone could help me with how to check or not, maybe thats the problem. Thanks, Ron
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
07-12-2018
08:26 PM
Well, I took an image of the server just after setup, and i tried again on a new ubuntu server, and it worked! Think nifi is running fine, about to check it out. I removed installing the ambari-metrics. not sure what the difference is, bcz i didn't change much other than that. Oh, one part did complain about disk space, when i went back and tried to reinstall again, so I bumped up from 8 to 32gb. Seems zookeeper and nifi installed ok.
... View more
07-12-2018
07:54 PM
Hi, I am getting failures when deploying to localhost. Trying to test out nifi mainly. Getting failures on zookeeper and nifi certificate auth. I did add a pasword for the Advanced nifi-ambari-ssl-config. But it still seems to fail.. the logs are below, and the screenshot of the failures is attached. My server setup is good i believe, Ubuntu 16, Build # 3.1.2.0-7. After i launch the wizard, and do the install options. It gave me 2 warnings about what i missed in my server setup, installing/enabling ntp, and disabling THP. Also changed hostname to localhost. Seems like zookeeper might be the problem, i'm not sure. I tried the fix in this article, but i dont' have a /usr/hdp/ folder. https://community.hortonworks.com/questions/33519/could-not-determine-hdp-version-for-component-zook.html Following this tutorial as closely as i can: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_installing-hdf/content/ch_install-ambari.html Thanks for any help, Ron p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff; min-height: 13.0px}
span.s1 {font-variant-ligatures: no-common-ligatures} 2018-07-12 19:33:25,796 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1 User Group mapping (user_group) is missing in the hostLevelParams 2018-07-12 19:33:25,799 - Group['hadoop'] {} 2018-07-12 19:33:25,800 - Group['nifi'] {} 2018-07-12 19:33:25,800 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-07-12 19:33:25,800 - call['/var/lib/ambari-agent/tmp/changeUid.sh zookeeper'] {} 2018-07-12 19:33:25,806 - call returned (0, '1001') 2018-07-12 19:33:25,807 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': 1001} 2018-07-12 19:33:25,807 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-07-12 19:33:25,808 - call['/var/lib/ambari-agent/tmp/changeUid.sh ams'] {} 2018-07-12 19:33:25,813 - call returned (0, '1002') 2018-07-12 19:33:25,814 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': 1002} 2018-07-12 19:33:25,814 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users'], 'uid': None} 2018-07-12 19:33:25,815 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-07-12 19:33:25,815 - call['/var/lib/ambari-agent/tmp/changeUid.sh nifi'] {} 2018-07-12 19:33:25,821 - call returned (0, '1004') 2018-07-12 19:33:25,821 - User['nifi'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': 1004} 2018-07-12 19:33:25,821 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-07-12 19:33:25,822 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-07-12 19:33:25,826 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2018-07-12 19:33:25,837 - Repository['HDF-3.1-repo-4'] {'append_to_file': False, 'base_url': 'http://public-repo-1.hortonworks.com/HDF/ubuntu16/3.x/updates/3.1.2.0', 'action': ['create'], 'components': [u'HDF', 'main'], 'repo_template': '{{package_type}} {{base_url}} {{components}}', 'repo_file_name': 'ambari-hdf-4', 'mirror_list': None} 2018-07-12 19:33:25,843 - File['/tmp/tmp0JlPhQ'] {'content': 'deb http://public-repo-1.hortonworks.com/HDF/ubuntu16/3.x/updates/3.1.2.0 HDF main'} 2018-07-12 19:33:25,843 - Writing File['/tmp/tmp0JlPhQ'] because contents don't match 2018-07-12 19:33:25,843 - File['/tmp/tmphevGen'] {'content': StaticFile('/etc/apt/sources.list.d/ambari-hdf-4.list')} 2018-07-12 19:33:25,843 - Writing File['/tmp/tmphevGen'] because contents don't match 2018-07-12 19:33:25,844 - File['/etc/apt/sources.list.d/ambari-hdf-4.list'] {'content': StaticFile('/tmp/tmp0JlPhQ')} 2018-07-12 19:33:25,844 - Writing File['/etc/apt/sources.list.d/ambari-hdf-4.list'] because contents don't match 2018-07-12 19:33:25,844 - checked_call[['apt-get', 'update', '-qq', '-o', u'Dir::Etc::sourcelist=sources.list.d/ambari-hdf-4.list', '-o', 'Dir::Etc::sourceparts=-', '-o', 'APT::Get::List-Cleanup=0']] {'sudo': True, 'quiet': False} 2018-07-12 19:33:25,975 - checked_call returned (0, '') 2018-07-12 19:33:25,976 - Repository['HDP-UTILS-1.1.0.21-repo-4'] {'append_to_file': True, 'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/ubuntu16', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '{{package_type}} {{base_url}} {{components}}', 'repo_file_name': 'ambari-hdf-4', 'mirror_list': None} 2018-07-12 19:33:25,978 - File['/tmp/tmpdUcPvV'] {'content': 'deb http://public-repo-1.hortonworks.com/HDF/ubuntu16/3.x/updates/3.1.2.0 HDF main\ndeb http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/ubuntu16 HDP-UTILS main'} 2018-07-12 19:33:25,978 - Writing File['/tmp/tmpdUcPvV'] because contents don't match 2018-07-12 19:33:25,978 - File['/tmp/tmpQlo4lv'] {'content': StaticFile('/etc/apt/sources.list.d/ambari-hdf-4.list')} 2018-07-12 19:33:25,979 - Writing File['/tmp/tmpQlo4lv'] because contents don't match 2018-07-12 19:33:25,979 - File['/etc/apt/sources.list.d/ambari-hdf-4.list'] {'content': StaticFile('/tmp/tmpdUcPvV')} 2018-07-12 19:33:25,979 - Writing File['/etc/apt/sources.list.d/ambari-hdf-4.list'] because contents don't match 2018-07-12 19:33:25,979 - checked_call[['apt-get', 'update', '-qq', '-o', u'Dir::Etc::sourcelist=sources.list.d/ambari-hdf-4.list', '-o', 'Dir::Etc::sourceparts=-', '-o', 'APT::Get::List-Cleanup=0']] {'sudo': True, 'quiet': False} 2018-07-12 19:33:26,207 - checked_call returned (0, 'W: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/ubuntu16/dists/HDP-UTILS/InRelease: Signature by key DF52ED4F7A3A5882C0994C66B9733A7A07513CAD uses weak digest algorithm (SHA1)') 2018-07-12 19:33:26,207 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-07-12 19:33:26,221 - Skipping installation of existing package unzip 2018-07-12 19:33:26,221 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-07-12 19:33:26,234 - Skipping installation of existing package curl 2018-07-12 19:33:26,234 - Package['hdf-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-07-12 19:33:26,246 - Skipping installation of existing package hdf-select 2018-07-12 19:33:26,285 - call[('ambari-python-wrap', u'/usr/bin/hdf-select', 'versions')] {} 2018-07-12 19:33:26,301 - call returned (1, 'Traceback (most recent call last):\nFile "/usr/bin/hdf-select", line 403, in <module>\nprintVersions()\nFile "/usr/bin/hdf-select", line 248, in printVersions\nfor f in os.listdir(root):\nOSError: [Errno 2] No such file or directory: \'/usr/hdf\'') 2018-07-12 19:33:26,425 - Could not determine stack version for component zookeeper by calling '/usr/bin/hdf-select status zookeeper > /tmp/tmpgMWFwS'. Return Code: 1, Output: ERROR: Invalid package - zookeeper Packages: accumulo-client accumulo-gc accumulo-master accumulo-monitor accumulo-tablet accumulo-tracer atlas-client atlas-server falcon-client falcon-server flume-server hadoop-client hadoop-hdfs-datanode hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-nfs3 hadoop-hdfs-portmap hadoop-hdfs-secondarynamenode hadoop-httpfs hadoop-mapreduce-historyserver hadoop-yarn-nodemanager hadoop-yarn-resourcemanager hadoop-yarn-timelineserver hbase-client hbase-master hbase-regionserver hive-metastore hive-server2 hive-server2-hive2 hive-webhcat kafka-broker knox-server livy-server mahout-client nifi nifi-registry oozie-client oozie-server phoenix-client phoenix-server ranger-admin ranger-kms ranger-tagsync ranger-usersync registry slider-client spark-client spark-historyserver spark-thriftserver spark2-client spark2-historyserver spark2-thriftserver sqoop-client sqoop-server storm-client storm-nimbus storm-supervisor streamline zeppelin-server zookeeper-client zookeeper-server Aliases: accumulo-server all client hadoop-hdfs-server hadoop-mapreduce-server hadoop-yarn-server hive-server
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
07-09-2018
07:07 PM
1 Kudo
ah, i think i figured it out, used the hostname -f command to get the host, and that did the trick, used that as the hostname. on my way now!
... View more
07-09-2018
04:38 PM
Hi, I'm new to Hortonworks and HDF, really excited to use the product, aiming to use nifi. Went through the standard server setup here: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_installing-hdf/content/ch_install-ambari.html I got my ubuntu 16 server running, and logged into ambari. I launched the install wizard, but i'm stuck on the cluster install/setup (https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_installing-hdf/content/ch_install-hdf.html) i just put 'localhost' in my Target hosts, also tried localhost.localdomain but same error below, hosts file looks like this: 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts copied the id_rsa key from server to a local file so i could add to Host registration information, also made sure i did the ssh keygen and added to authorized keys. Changed my user account to ubuntu. But when i run the install it fails with a bunch of "Connection to localhost closed" messages: ==========================
Creating target directory...
==========================
Command start time 2018-07-09 16:30:08
Connection to localhost closed.
SSH command execution finished
host=localhost, exitcode=0
Command end time 2018-07-09 16:30:08
==========================
Copying ambari sudo script...
==========================
Command start time 2018-07-09 16:30:08
scp /var/lib/ambari-server/ambari-sudo.sh
host=localhost, exitcode=0
Command end time 2018-07-09 16:30:09
==========================
Copying common functions script...
==========================
Command start time 2018-07-09 16:30:09
scp /usr/lib/python2.6/site-packages/ambari_commons
host=localhost, exitcode=0
Command end time 2018-07-09 16:30:09 Must be something in the setup i didn't do right, also not the most adept linux/ubuntu person. Thanks for any help, Ron
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)