About stevenmatison

stevenmatison · ‎08-13-2018

I would start by creating a database or data file of your 100 ftp servers and credentials. Use NiFi to query this data and then send the results downstream into your data flow where downstream NiFi processors are configured dynamically using ${attributes} from the flow files generated by querying the ftp data. This will make your data flow very simple and dynamic. If this answer helps, please choose ACCEPT.

stevenmatison · ‎08-10-2018

@Zach This is how I was able to get through my ExecuteScript project: https://community.hortonworks.com/content/kbentry/75032/executescript-cookbook-part-1.html Everything you need to know and more is included in that 3 part series. My other advice is to create and test your script directly in command line. Then start to work with it in the context of ExecuteScript. While working with the NiFi Processor you should also be tailing the nifi log file(s) as they will possibly contain more information than you see in the NiFI UI errors. If this answer helps, please choose ACCEPT!

stevenmatison · ‎08-09-2018

@Saurabh use your /etc/hosts file to set a FQDN to the ip address within the ambari-server and all nodes. This will do the same thing as actually registering the DNS and IP publically within the scope of your cluster. Using /etc/hosts you can make the FQDN to be whatever you want: [ambari@ks3503-D050 ~]$ cat /etc/hosts 108.100.46.50 ks3503-d050 ks3503-d050.domain.com 108.100.46.51 ks3503-d051 ks3503-d051.domain.com 108.100.46.52 ks3503-d052 ks3503-d052.domain.com 108.100.46.53 ks3503-d053 ks3503-d053.domain.com Accept this answer if this /etc/hosts solution helps you.

stevenmatison · ‎08-06-2018

@Takefumi Oide When you install ambari it will install its own appropriate version of Postgres. You do not have to install postgres separately. If you do want to control the version of postgres, you can complete your own Postgres install and setup advanced database configuration during ambari-server setup command. As always, please choose ACCEPT if this answer helps.

stevenmatison · ‎08-06-2018

@Gitanjali Bare - Please confirm you have the correct settings in the Nifi Processor PublishKafka. Here is a working example: Important settings: Kafka Broker: hostname:6667 - many times documentation shows a different port, use 6667 Topic Name: Mine is a variable, you can enter anything here: "test", etc Kafka Key: Not required you can leave it empty. If this answer helps, please click ACCEPT.

stevenmatison · ‎08-06-2018

You will need to add the ssh-key for the ambari server to ~/ or /root/.ssh/authorized_keys. This will allow the server to ssh to itself and install the agent. Be sure to test before trying: ssh root@localhost and accept the prompts the first time. If this answer helps, please ACCEPT.

stevenmatison · ‎08-01-2018

To properly troubleshoot elasticsearch you first need to make sure that elasticsearch is actually running correctly. Go to QuickLinks and open Elasticsearch Health. Status must be green. Tail your elasticsearch master node log file while restarting to see if there are any issues: tail -f /var/log/elasticsearch/elasticsearch.log Additionally you can edit /etc/elasticsearch/log4j2.properties and set logger.action.level to debug for more verbose logging. Based on the log output, you will likely need to adjust config settings in Advanced elastic-site from Ambari. Here are my settings for a Master Node + 2 Data Nodes: bootstrap_memory_lock true cluster_name elasticsearch cluster_routing_allocation_disk_threshold_enabled true cluster_routing_allocation_disk_watermark_high 0.99 cluster_routing_allocation_disk_watermark_low .97 cluster_routing_allocation_node_concurrent_recoveries 4 discovery_zen_fd_ping_interval 15s discovery_zen_fd_ping_retries 5 discovery_zen_fd_ping_timeout 60s discovery_zen_ping_timeout 3s expected_data_nodes 0 gateway_recover_after_data_nodes 1 http_cors_enabled "true" http_port 9200 index_merge_scheduler_max_thread_count 5 index_number_of_replicas 2 index_number_of_shards 4 index_refresh_interval 1s index_translog_flush_threshold_size 5g indices_cluster_send_refresh_mapping false indices_fielddata_cache_size 25% indices_memory_index_buffer_size 10% indices_memory_index_store_throttle_type none masters_also_are_datanodes "true" network_host [ 0.0.0.0 ] network_publish_host [] path_data "/hadoop/elasticsearch/es_data" recover_after_time 15m threadpool_bulk_queue_size 3000 threadpool_index_queue_size 1000 transport_tcp_port 9300 zen_discovery_ping_unicast_hosts [ "fqdn.hostname1.com", "fqdn.hostname2.com", "fqdn.hostname3.com" ]

stevenmatison · ‎08-01-2018

see my Answer below.

stevenmatison · ‎08-01-2018

I have 1 master node, and 2 data nodes my settings are: expected_data_nodes: 0 gateway_recover_after_data_nodes: 1

stevenmatison · ‎07-27-2018

Please confirm SELINUX is disabled and Firewall is off/disabled.

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Best approach for getting files from 100+ ftp ...

Re: executeScript not working

Re: can we register a new host with IP not a FQDN

Re: Version of PostgreSQL for Ambari 2.6.2.2 + HDP...

Re: Error while using publish kafka

Re: ambari agent failed to start Permission denie...

Re: Metron - Elasticseach service unavailable (mes...

Re: Metron - Elasticseach service unavailable (mes...

Re: Metron - Elasticseach service unavailable (mes...

Re: HDFS port 8020 not accessible from outside.