Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 5687 | 01-11-2021 05:54 AM | |
| 3812 | 01-11-2021 05:52 AM | |
| 9487 | 01-08-2021 05:23 AM | |
| 9288 | 01-04-2021 04:08 AM | |
| 38604 | 12-18-2020 05:42 AM |
04-22-2020
05:14 AM
@stevenmatison Thank you very much!
... View more
04-21-2020
06:25 AM
Hi @cjervis i resolved it by adding the below parameter hive.server2.parallel.ops.in.session=true under Hive > Config > Advanced > "Custom hive-site" .
... View more
04-17-2020
10:03 PM
Thank for you this reply! This has been quite difficult for me to troubleshoot, but I finally figured it out. These machines I've been using had chrony on them all along, but the previous machines I set up did not have chrony installed. Chrony and ntpd were both enabled, and ntpd was getting exited on reboot. Because the host monitor issues "ntpq -np", and ntpd was loaded but inactive, it would report a failure to query the server, even though chrony was running. I had no idea that chrony was installed, and thus, the whole problem could've been solved by just disabling/uninstalling ntpd. I spent WAY too many hours to come to such a simple solution. It may be very helpful to someone who doesn't understand network time protocols very well if there was a suggestion to explain potential conflicts between ntpd and chronyd in the documentation, or even to take a second to check which (if any) you already have installed. Maybe it won't be an issue for most people, but for me, assuming that I didn't have chrony already running cost me a bunch of time getting my cluster healthy. I would check, find ntpd dead, see no problems reported on Host Monitor, wonder why the hell ntpd died, kill ntpd, run ntpdate, restart ntpdate, restart scm-agent, and that would "fix" it, but on reboot it would go back to using chrony and exit ntpd, and host monitor would report failure to query ntp service, even though the machine was using chrony and synced just fine all along. I appreciate your help!
... View more
04-16-2020
06:49 AM
@Udhav You need to map a FQDN (fully qualified domain name) to your "localhost" via /etc/hosts For example I often use "hdp.cloudera.com". cat /etc/hosts | grep 'cloudera.com': 1xx.xxx.xxx.xxx hdf.cloudera.com 1xx.xxx.xxx.xx hdp.cloudera.com Next you put the FQDN in the list of hosts during Cluster Install Wizard. Be sure to complete the next required steps for ssh key, agent setup, etc. When the Confirm Hosts modal fails, you can click the Failed link, open modals and get to the full error. The easiest way for me to spin up ambari/hadoop in my computer is using AMBARI VAGRANT: https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide This provides an easy way to spin up 1-X number of nodes in my computer, and it handles all the ssh-keys and host mappings. Using this I can spin up ambari on centos with just a few chained commands: wget -O /etc/yum.repos.d/ambari.repo http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.0.0/ambari.repo && yum --enablerepo=extras install epel-release -y && yum install java java-devel ambari-server ambari-agent -y && ambari-server setup -s && ambari-server && ambari-server start && ambari-agent start
... View more
04-16-2020
06:35 AM
@Damian_S Yes, I always use mysql/mariadb for hive metastore. If you have the original data you can you just move it to the new location? This should be part of your migration steps regardless of the backend for the metastore.
... View more
04-13-2020
06:16 AM
@ForrestGump No, out of the box NiFi should work with all Processors and Controller Services. The stock configuration of NiFi should work without any issues. You should not see any inconsistencies or stability issues unless you are exceeding resources available on the NiFi node(s). If you are seeing specific issues with "Regex, prepend, and other items" they each should give a very specific error. Sometimes the errors are not shown in the UI. You have to tail the nifi app logs in order to see full error informations.
... View more
04-09-2020
11:52 AM
1 Kudo
@bhara You do not have to use LDAP. You can create the users in HUE admin, using the first admin user you created. If you want to configure LDAP please see official documentation here: https://docs.gethue.com/administrator/configuration/server/#ldap You will need to make the LDAP hue.ini changes via ambari in the HUE->Config->Advanced->Advanced Hue-Ini and restart hue after each change. Your error above are 2 issues I notice: SSL Configuration for HDFS. Your HUE truststore must have the ssl certs for hdfs hosts https://gethue.com/configure-hue-with-https-ssl/ (bottom section) https://docs.cloudera.com/documentation/enterprise/5-11-x/topics/cm_sg_ssl_hue.html (top section) HDFS Configuration - Doc Here The SSL Example links above are not specific to your case (HDP) but still apply. Also I am assuming you have hdfs secure. The links I share for SSL outline the fundamentals required to put the right HDFS and SSL settings in hue.ini for secure access to hdfs. The HDFS Configuration link is official gethue.com documentation for HDFS. You will need to make the SSL hue.ini changes via ambari in the HUE->Config->Advanced->Advanced Hue-Ini and restart hue after each change.
... View more
04-08-2020
01:55 PM
1 Kudo
Thanks for the effort @stevenmatison I imagine it does have to do with configuration.
... View more
04-08-2020
12:47 PM
1 Kudo
@bhara Change the line 193 below and try to start again. File:/var/lib/ambari-agent/cache/common-services/HUE/4.6.0/package/scripts/params.py dfs_namenode_http_address = config['configurations']['hdfs-site']['dfs.namenode.http-address'] to dfs_namenode_http_address = 'localhost' That will give dfs_namenode_http_address a value and get past the error.
... View more
04-08-2020
03:35 AM
1 Kudo
The creation of large database schemas can be a very complicated task. In this article, I am going to share how I used NiFi to fully automate a monstrous task. For my project, I needed to create very large Avro Schemas, and corresponding Hive Tables for five or more Data Sources, each having from 400-500+ different CSV columns. Doing this manually would have been a nightmare just to manage the initial schema creations. Managing schema changes an even bigger task over time. My answer was the Schema Generator API using NiFi and Schema Registry.
Please reference the following NiFi Template:
Schema Generator API Demo Template
Schema Generator API
The above NiFi template provides the following capabilities:
A NiFi API is capable of the following:
Accepting POST for a new table, given the tables data columns and data types is in CSV Format
Creating Schema Registry Entity (POST: create)
Creating Schema Registry Avro Schema (POST: parse)
Creating Hive HQL Schema
Executing Hive HQL Statement
A sample call to create Schema Registry Entity (demo).
A sample call to parse Data Columns (22 string columns).
Lots of helpful labels with notes.
The following are the template setup instructions:
Download, Upload Template, and Drag Template to your NiFi Pallette.
Make sure a Schema Registry is Setup within reach of NiFi.
Edit the following Schema Generator Demo Process Group's Variables:
schemaGeneratorApiHost
schemaGeneratorApiPort
schemaRegistryUrl
hiveDatabaseName
hiveDatabaseConnectionUrl (jdbc string)
hiveConfigurationResources (path to hive-site.xml)
Enable controller services in Schema Generator API process group:
StandardHttpContextMap for HandleHttpRequest & Response
HiveConnectionPool for PutHiveQl
Start Schema Generator API Processor group.
Navigate to samples and execute Sample Call 1, then 2 by switching appropriate GenerateFlowFile On/Off. These 2 proc are disabled by default as you should switch them On and Off immediately. These are the only 2 proc that should not always run. Disable them again when done.
This is just a basic demonstration to get you started with Schema Registry and Data Source Schema Automation. Parts of this template are also helpful for anyone who needs to automate creating Avro Schemas and/or Hive Schemas for large CSVs which could still be done without Schema Registry. The demo above has been tested up to 500 columns and includes mapping various different column types to hive data types.
Important Information
The template is built and tested on NiFi 1.9, Single Node Nifi Cluster, with local Schema Registry Installed.
The Schema Registry UI doesn't have full capability. Learn the API to work with your Schemas Directly. For Example: Delete. See my previous post Using the Schema Registry API for detailed API info.
Versioning Schema Forward and Backward can be very problematic. Be Warned.
Use a proper and consistent table and column naming conventions. Complicated column names will break Avro and Hive. Example characters include but not limited to: spaces, /, \, $, *, [, ], (, ), etc.
Schema Registry Entities and Associated Avro Schemas can be used in NiFi Record Readers, using HortonworksSchemaRegistry, and other Controller Services.
... View more
Labels: