Member since
01-19-2017
3681
Posts
633
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1640 | 06-04-2025 11:36 PM | |
| 2089 | 03-23-2025 05:23 AM | |
| 997 | 03-17-2025 10:18 AM | |
| 3776 | 03-05-2025 01:34 PM | |
| 2601 | 03-03-2025 01:09 PM |
12-07-2020
01:54 PM
@G-Tz Yes NIfi is fault-tolerant is configured in a cluster min 3 nodes like the zookeeper that co-ordinates the cluster. To get a better understanding please read through this Cloudera Nifi Clustering config Happy hadooping
... View more
12-07-2020
01:42 PM
@hayhapra Have a look at this Cloudera Druid tableau connection Happy hadooping
... View more
11-08-2020
03:30 AM
@Amn_468 The Namenode is the brain of the cluster, it has the footprint of the cluster location of the files, ACL's, stores the HDFS metadata, the directory tree of all files in the file system, and tracks the files across the cluster and does not store the actual data or the dataset. The data itself is actually stored in the Datanodes. Your error 2020-10-27 16:20:05,140 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1821ms GC pool 'ParNew' had collection(s): count=1 time=2075ms) This indicates that the NameNode paused for longer than the expected time of 60000ms. This also explains why DataNode did not get a response from NameNode in designated 60000ms. The warning also indicates that the pause was due to GC which calls for a memory and GC Tuning. NameNode knows the location, list of the blocks with this information NameNode knows how to construct the file from blocks. The fastest way to render this information is to store it in memory that's the reason the NN is usually on a high-end server configured with a lot of memory (RAM). because the block locations are stored in RAM An ideal starter config in production for a datanode and Namende would be Name Node Configuration Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 128 GB
Disk: 6 x 1TB SATA
Network: 10 Gigabit Ethernet Data Node Configuration Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 64 GB
Disk: 12-24 x 1TB SATA
Network: 10 Gigabit Ethernet A fundamental parameter to tune for garbage collectors is the number of HDFS blocks stored in the Hadoop cluster in your case 23,326,719 files. The number of files, and associated blocks, is a fundamental parameter in the tuning process. The Namenode maintains the complete directory structure in memory. Therefore, more files mean more objects to manage. Most of the time, Hadoop clusters are configured without knowledge of the final workload in terms of the number of files that will be stored. Having in mind the strong connection between these two aspects is crucial to anticipate future turbulence in the hdfs quality of service. You should analyze log prints produced by the garbage collector the gc.log files found in the Namenode logs directory the available memory is filling up before the garbage collector activity is able to release it. Hope that helps
... View more
11-03-2020
03:31 PM
@jlguti I think your problem according to the log you share is network-related, Check your /etc/hosts ensure that the hostnames can be DNS resolved. Caused by: java.io.IOException: Failed to connect to bupry-dev-00:46319
Caused by: java.net.UnknownHostException: bupry-dev-00 Make sure the hosts' entries are FQDN and the first lines IPv4 and IPv6 are not tampered with # Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
##############################################
192.168.0.20 your_host_name Host_Alias Or something like this 127.0.0.1 localhost
127.0.1.1 techpiezo-pc
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters Please revert
... View more
11-03-2020
01:00 PM
@jlguti Can you share the output of the below command? Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container. yarn logs -applicationId application_1604418534431_0001 That could give us pointers to the potential issue either memory or some misconfiguration. Happy Happoingg
... View more
10-27-2020
12:58 AM
@Amn_468 The NameNode is solely responsible for the Cluster Metadata so please increase the NN heap size and restart the services. Please revert
... View more
10-26-2020
11:55 PM
@Amn_468 Increasing the Java Heap Size for the NameNode and Secondary NameNode Services,you could be using the default 1GB setting for heap size As a general rule of thumb take a look at the configuration of your Heap Sizes for every 1 Million Blocks in your cluster should have at least 1GB of Heap Size. 2 Million Blocks 2GB heap size 3 Million Blocks 3GB heap size ..... n Million Blocks n GB heap size After increasing the Java Heap Size and restart the HDFS Services that should resolve the issue. Please revert
... View more
10-10-2020
11:35 AM
1 Kudo
@mike_bronson7 Always stick to the Cloudera documentation. Yes !!! there is no risk in running that command I can understand your reservation.
... View more
10-10-2020
10:50 AM
1 Kudo
@bvishal SmartSense Tool (HST) gives all support subscription customers access to a unique service that analyzes cluster diagnostic data, identifies potential issues, and recommends specific solutions and actions. These analytics proactively identify unseen issues and notify customers of potential problems before they occur. That is okay as you are just testing and you don't need to buy support which is advised when running a production environment To configure SmartSense you will need to configure the /etc/hst/conf/hst-server.ini the inputs/values you will get from Hortonworks support if you have paid for a subscription customer.smartsense.id
customer.account.name
customer.notification.email
customer.enable.flex.subscription The error you are encountering is normal and won't impact your cluster Hope that helps
... View more
09-11-2020
04:22 PM
@wert_1311 Domain name changes will affect the KDC database. Kerberos is super sensitive to domain changes according to experience you will have to recreate the KDC database and regenerate the keytabs/principals to enable you applications to reconnect. Cluster hostname If the hosts in the cluster were re-named ie host1.old.com to host1.new.com then ensure those changes are also reflected or resolved by the DNS. This is going a tricky one but fortunately, CM or Ambari will make your work easy now that your domain has changed the earlier generated keytabs have the old domain name . A keytab contains a pair of principals and an encrypted copy of that principal's key it's unique to each host since the principal names include the hostname and may be concatenated with the domain name Delete the old KDC database Usually, as the root user call the Kerberos database utility kdb5_util destroy assuming the old domain was OLD.COM this should delete the keytabs and principals linked to the old REALM, # kdb5_util -r OLD.COM destroy You will need to manually delete the keytabs liked to the old REALM on the file system /etc/security/keytabs/ [HDP] or /etc/hadoop/conf/[CDH]. You will be prompted to confirm before destroying the database, usually, this is a better option if you have second thought rather than using the kdb5_util destroy -f will naturally not prompt you for a confirmation Recreate the New KDC database Use the Kerberos database utility kdb5_util create [-s] assuming the new domain was NEW.COM # kdb5_util NEW.COM create
# kdb5_util -r NEW.COM create -s With the -s option, kdb5_util will stash a copy of the master key in a stash file this allows a KDC to authenticate itself to the database utilities, such as kadmin, kadmind, krb5kdc, and kdb5_util best option. Update Kerberos files. Make sure you update the below files to reflect the new REALM assuming your MIT KDC server's domain isn't changed. krb5.conf
kdc.conf
kadm5.acl
Auth-to-local Rules
jaas.conf files [if being used by applications] Enable Kerberos Using CM or Ambari the process is straight forward. Please let me know if you need more help
... View more