About Shelton

Shelton · ‎06-04-2018

@Michael Bronson That's an annoying message I once got, but you can repost the same change the Header and delete the olf posting... it worked for me. Try the hack 🙂

Shelton · ‎06-04-2018

@Michael Bronson Both are recommended for running Kafka (XFS or ext4). XFS typically performs well with little tuning when compared to ext4 and it has become the default filesystem for many Linux distributions. XFS is a very high performance, scalable file system and is routinely deployed in the most demanding applications. It's RHEL 7 is the default file system and is supported on all architectures. XFS has its advantages but in a JBOD setup, it doesn't really provide a lot of benefits. Ext4 does not scale to the same size as XFS, is fully supported on all architectures and will still continue to see active development and support. See HCC Kafka KB Article Hope that helps!!!

Shelton · ‎06-04-2018

Great to know your LLAP started !!!

Shelton · ‎06-04-2018

@Erkan ŞİRİN Some deep dive in the setup could be worthy,please have a look at these 2 resources, they could be of help. https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html https://community.hortonworks.com/articles/149899/investigating-when-llap-doesnt-start.html

Shelton · ‎06-04-2018

@Erkan ŞİRİN Your real problem is "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed]" On an unsecure hadoop cluster with python 2.7.9 installed, slider agent fails with SSL validation errors Check if your python version is python 2.7.9 and Slider version is less than 0.92 https://issues.apache.org/jira/browse/SLIDER-942 If the above is correct then download the patch and try

Shelton · ‎06-03-2018

@Samant Thakur Have you configured your cluster for rack awareness? Rack awareness prevents data loss Rack awareness improves network performance HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster. You will need the help of your network/data center team to share the network topology and how the nodes are spread out in the racks. You can use Ambari UI --> Hosts to set the rack topology after knowing the subnets and DC setup. To understand better see HDP rack awareness also see HCC rack-awareness-series-1 and HCC rack-awareness-series-2 Hope that helps

Shelton · ‎06-03-2018

@Adi Jabkowsky Below is the procedure to remove the corrupt blocks or files Locate the files have blocks that are corrupt. $ hdfs fsck / | egrep -v '^\.+ or $ hdfs fsck hdfs://ip.or.host:50070/ | egrep -v '^\.+ This will be a list the affected files, and the output will not be a bunch of dots, the output should include something like this with all your affected files. Sample output /path/to/filename.file_extension: CORRUPT blockpool BP-1016133662-10.29.100.41-1415825958975 block blk_1073904305 /path/to/filename.file_extension: MISSING 1 blocks of total size 15620361 B The next step would be to determine the importance of the file, can it just be removed and copied back into place, or is there sensitive data that needs to be regenerated? you have a replication factor of 1 so analyze well. Remove the corrupted file(s) This command will move the corrupted file to the trash incase you realise the files is importantyou still have an option of recovering it . $ hdfs dfs -rm /path/to/filename.file_extension When you use skip the trash to permanently delete if you are sure you really don't need that file. $ hdfs dfs -rm -skipTrash /path/to/filename.file_extension How to repair a corrupted file if it was not easy to replace? $ hdfs fsck /path/to/filename/file_extension -locations -blocks -files or $ hdfs fsck hdfs://ip.or.hostname.of.namenode:50070/path/to/filename/file_extension -locations -blocks -files You can track down the datanode where the corruption is and look through logs and determine what the issue is. Please revert.

Shelton · ‎06-01-2018

@Pankaj Singh Any updates? If you found an answer addressed your question, please take a moment to log in and click the "accept" link on the answer.

Shelton · ‎06-01-2018

@Pankaj Singh Not really I usually setup the MySQL databases and test connectivity before the cluster setup.

Shelton · ‎06-01-2018

@Pankaj Singh Setting the cluster through Ambari admin does also create a cluster of MySQL server & hive server. (NO) You will need an RDBMS for storing the Hive metastore service that stores the metadata for Hive tables and partitions in a relational database Hive is a data warehouse software built on top of Hadoop for providing data summarization, query, and analysis. It gives a SQL-like interface to query data stored in HDFS. All queries go through the Hive metastore which translates SQL access to this information using the metastore service API When planning a robust cluster (production) you shouldn't use the derby database but one of the following Oracle,MySQL, MS SQL, MariaDB etc these databases should be setup before running ambari or during the Ambari server setup. These components will need a Relational database Ambari, Hive, Oozie, Ranger You can enable Hive metastore high availability (HA), so that your cluster is resilient to failures due to a metastore that becomes unavailable each being independent. see attached HiveMetaHA Steps of setting up Metadata databases

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: what is the right filesystem for the kafka dis...

Re: what is the right filesystem for the kafka dis...

Re: Hive HiveServer2 Interactive LLAP doesn't star...

Re: Hive HiveServer2 Interactive LLAP doesn't star...

Re: Hive HiveServer2 Interactive LLAP doesn't star...

Re: Data Nodes displaying incorrect block report

Re: Blocks with corrupted replicas

Re: HDP cluster for mysql server & hive server

Re: HDP cluster for mysql server & hive server

Re: HDP cluster for mysql server & hive server