Member since
09-18-2018
13
Posts
0
Kudos Received
0
Solutions
11-12-2020
10:51 AM
Hello @lenu If you have Replication enabled, WALs are likely to be persisted until the WALs are replicated. If you aren't using HBase Replication, Ensure there are no Peers (via "list_peers") & "hbase.replication" Property is false. If the oldWALs aren't removed, Enable TRACE Logging for the HBase Master Service, which would print the CleanerChore Thread removing or skipping any entries. - Smarak
... View more
10-28-2018
06:17 PM
1 Kudo
@Lenu K 1.Using Spark-Hbase Connector: You can use Spark-Hbase connector to get data from Hbase table using Spark and store until what time you have pulled of records from the HBase table. For the next run get the state and use it as lower bound and current time as upper bound pull the data from Hbase table and insert into Hive table. By using this way we are not creating full snapshot of HBase table as Hive orc table instead we are incrementally loading the data into hive table and use hive table data for analytics. 2.Using Hive Merge strategy: You can use Hive Merge strategy introduced in HDP-2.6 but for this case your hive table needs to be Transactional enabled. merge into transactional_table using <hbase_hive_table>... etc for more details refer to this link. another way using hive would be using CTAS as mentioned above in comments for the first run it will take more time but from the 2 run you can only pull the incremental records from HBase table and load into Hive orc table(if you are following this approach then using spark-hbase connector will give more performence.) 3.Using Apache-Phoenix: Using Apache phoenix to get the data from HBase table as Phoenix table will be pointed to HBase table and allows to run sql queries on top of HBase stored data. Difference between Hive-Hbase integration vs Phoenix-Hbase integration
... View more
08-09-2019
06:19 PM
Sorry for the delay! Our moderators needed to remove some corporate-sensitive details in your post, but it is now published.
... View more
10-04-2018
11:50 PM
@Lenu K Unfortunately Prior to Ambari 2.7 in Ambari Ui there is no option to see which user has performed the operations. Ambari 2.7 screenshot: However in previous version of ambari like Ambari 2.6 you can check the "ambari-audit.log" to see the activities performed by the user. Example: # tail -f /var/log/ambari-server/ambari-audit.log 2018-10-04T23:44:21.489Z, User(admin), RemoteIp(101.142.180.147), Operation(Request from server), RequestType(POST), url(http://hd1.example.com:8080/api/v1/clusters/TestCluster/requests), ResultStatus(202 Accepted), Command(RESTART), Cluster name(TestCluster)
2018-10-04T23:44:21.535Z, User(admin), Operation(Restart all components for SmartSense), Status(IN_PROGRESS), RequestId(852) .
... View more
10-03-2018
09:01 PM
2 Kudos
@Lenu K Your question is rather wide for a small cluster all depends on manpower at hand, for HDF remember to back up the flow files, below are immediately what comes into my mind. Fresh Install pros and con's Better planned Here you get a clean installation maybe properly configured mistakes learned from the current cluster setup. Straightforward no upgrade surprises. Loose Customization Upgrade pros and cons' Must plan properly and document steps Expect technical surprises and challenge. Plan support if not having one already on the D-day Challenges mold you to a better hadoopist! See Mandatory Post-Upgrade Tasks Best practice Verify that the file system you selected is supported HWX Pre-create all the databases Backup your cluster before either of the above. Plan for at least NN/RM HA (NN are the brain so allocate good memory) MUST have 3 Zookeeper HDD planning is important SSD for SCSI Restrict access to the cluster from the ONLY edge node. Kerberize the Cluster Configure SSL think of SSD for Zk,Hbase and OS can also use the SSD acceleration for temp tables in hive, exposing the SSD via HDFS Plan well the Data center network(Backup lines) Size your nodes memory and storage properly. Beware if performance is a must especially with Kafka and Storm are memory intensive. Delegate authorization to Ranger. Test upgrade procedures for new versions of existing components Execute performance tests of custom-built applications Allow end-users to perform user acceptance testing Execute integration tests where custom-built applications communicate with third-party software Experiment with new software that is beta quality and may not be ready for usage at all Execute security penetration tests (typically done by an external company) Let application developers modify configuration parameters and restart services on short notice Maintain a mirror image of the production environment to be activated in case of natural disaster or unforeseen events Execute regression tests that compare the outputs of new application code with existing code running in production HTH
... View more