Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 869 | 06-04-2025 11:36 PM | |
| 1443 | 03-23-2025 05:23 AM | |
| 723 | 03-17-2025 10:18 AM | |
| 2597 | 03-05-2025 01:34 PM | |
| 1719 | 03-03-2025 01:09 PM |
04-27-2018
06:22 AM
@Sriram Hadoop Hadoop Distributed File System was designed to hold and manage large amounts of data; therefore typical HDFS block sizes are significantly larger than the block sizes you would see for a traditional filesystem the block size is specified in hdfs-site.xml. The default block size in Hadoop 2.0 is 128mb, to change to 256MB edit the parameter, dfs.block.size to change to the desired block size eg 256 MB and you will need to restart all the stale services for the change to take effect. It's recommended to always use Ambari UI to make HDP/HDF changes Existing files' block size can't be changed, In order to change the existing files' block size, 'distcp' utility can be used. or Override the default block size with 265 MB hadoop fs -D dfs.blocksize=268435456 -copyFromLocal /tmp/test/payroll-april10.csv blksize/payroll-april10.csv Hope that helps
... View more
04-26-2018
03:25 PM
@raj pati Error "Causedby: org.apache.hadoop.hbase.ipc.CallTimeoutException:Call id=9, waitTime=10001, operationTimeout=10000 expired." is a timeout exception. Whats the value of hbase.rpc.timeout ? hbase.client.scanner.timeout.period is a timeout specifically for RPCs that come from the HBase Scanner classes (e.g. ClientScanner) while hbase.rpc.timeout is the default timeout for any RPC. I believe that the hbase.client.scanner.timeout.period is also used by the RegionServers to define the lifetime of the Lease (the cause of the LeaseException you're seeing). Generally, when you see these kinds of exceptions while scanning data in HBase, it is just a factor of your hardware and current performance (in other words, how long it takes to read your data). I can't really give a firm answer because it is dependent on your system's performance Could you adjust these parameters and restart the Hbase stale configs and test Change the below values through Ambari and test <property>
<name>hbase.client.scanner.timeout.period</name>
<value>70000</value>
</property> And also <property>
<name>hbase.rpc.timeout</name>
<value>70000</value>
</property> It should run successfully.
... View more
04-26-2018
03:07 PM
@Saravana V In the below I used the hive user and password, Access to HiveServer2 in kerberized cluster using hive CLI has been deprecated. Example One $ beeline
$ ! connect jdbc:hive2://{hive_host}:10000/test Here you MUST be the test database owner as you will be prompted for username / password see below prompts Hs2 in kerberized cluster Note the hive principal and REALM $ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline> ! connect jdbc:hive2://france.paris.fr:10000/;principal=hive/france.paris.fr@PARIS.FR
Connecting to jdbc:hive2://france.paris.fr:10000/;principal=hive/france.paris.fr@PARIS.FR Will be prompted for username/password Enter username for jdbc:hive2://france.paris.fr:10000/;principal=hive/france.paris.fr@PARIS.FR:{username}
Enter password for jdbc:hive2://france.paris.fr:10000/;principal=hive/france.paris.fr@PARIS.FR:{password}
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://france.paris.fr:10000/> show databases;
+----------------+
| database_name |
+----------------+
| default |
| agricole |
| achats |
+----------------+--+
3 rows selected (2.863 seconds)
0: jdbc:hive2://london.nakuru.com:10000> use agricole;
... View more
04-25-2018
08:25 PM
1 Kudo
@Michael Bronson The disk is already unusable go-ahead run fsck with a -y option to repair it 🙂 see above Either way you will have to replace that dirty disk anyways!
... View more
04-25-2018
08:17 PM
1 Kudo
@Michael Bronson The Avahi daemon is present by default and allows you to discover network resources. No hortonworks documentation talks about it just because I think it trivial. Do you NEED it? No, AFAIK nobody absolutely needs this, it's a recent technology and the networks have worked before this. It can be convenient though on networks available to the public, but if you administer your network you know where services are. The only services that affect HDP are documented below especially the last 4
Set Up Password-less SSH Set Up Service User Accounts Enable NTP on the Cluster and on the Browser Host Check DNS and NSCD Configuring iptables Disable SELinux and PackageKit and check the umask Value DNS should work without Avahi, disable on a couple of nodes you will see the seervice still running fine
... View more
04-25-2018
08:03 PM
@Michael Bronson Avahi is a system which facilitates service discovery on a local network via the mDNS/DNS-SD protocol suite. This enables you to plug your laptop or computer into a network and instantly be able to view other people who you can chat with, find printers to print to or find files being shared. Compatible technology is found in Apple MacOS X (branded Bonjour and sometimes Zeroconf) The two big benefits of Avahi are name resolution & finding printers, but on a server, in a managed environment, it's of little value. unmounting and mount filesystems are a common thing especially in Hadoop clusters, your SysOps team should have validated that, but all looks correct to me. Do a dry run with the below code to see what will be affected that will give you a better picture. # e2fsck -n /dev/sdc The data will be reconstructed as you have default replication factor you can later rebalance the HDFS data
... View more
04-25-2018
07:49 PM
@Michael Bronson Avahi is a system which facilitates service discovery on a local network via the mDNS/DNS-SD protocol suite. This enables you to plug your laptop or computer into a network and instantly be able to view other people who you can chat with, find printers to print to or find files being shared. Compatible technology is found in Apple MacOS X (branded Bonjour and sometimes Zeroconf) The two big benefits of Avahi are name resolution & finding printers, but on a server, in a managed environment, it's of little value. You may want to run the following systemctl disable avahi-daemon.socket avahi-daemon.service
Be aware though that the above will disable avahi only temporarily. To prevent automatic re-enabling, it needs to be masked: systemctl mask avahi-daemon.socket avahi-daemon.service Hope that helps
... View more
04-24-2018
08:48 PM
1 Kudo
@Rohit Sharma Please find below answer to your question though I didn't understand exactly what you meant!!! 1. What are kerberize zone advantages. ? You can't Kerberize a zone but a cluster, but you can create an encryption zone those are 2 different things. The primary design goal of Kerberos is to eliminate the transmission of unencrypted passwords across the network. If used properly, Kerberos effectively eliminates the threat that packet sniffers would otherwise pose on a network. 2. Which services should i considered to keep in that zone? Again some confusing here. An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read.You can store for example HR salary scheme, or just about any document you deem needs protection You either Kerberize the whole cluster or not, 3. Which approach is good to use other the this.? You need Kerberos if you're serious about security. AD/LDAP will cover only a fraction of components, many other systems will require Kerberos for identity. One can still keep users in the LDAP, but the first line in the infrastructure will be Kerberos. Kerberos is the defacto standard for securing your hadoop environment couple with SSL/SASL and the traditional firewalls and physical protection (Caged nodes in a datacenter) 4. If it's HA and Prod environment what are best practices.? HA and prod I don't see the link. HA is basically having a redundant system which is fault tolerant. And Prod environment is self-explanatory 5. How to implement and configure if I am planning to add ranger poorly? For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox Authorization can be done via Ranger or using the natively supported ACL. Except for Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless. Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization. 6. If it is integrated with HDP and HDF cluster what would be administraror good practice? Now HDP & HDF are both managed by Ambari so that sort of simplifies so admin task for more info 7. Study materials if any in HDP? Better start off with HDP or HDF sandboxes see HDP HDF
... View more
04-24-2018
09:09 AM
@Victor Hely Your Atlas is in maintenance mode !! Can you go to the Ambari UI see attached screenshot Turn Off Maintenance mode and restart Atlas ensure Hbase is running all should work. If all works remember to accept the answer so others can also use it to resolve similar issue
... View more