Community Articles

Find and share helpful community-sourced technical articles.
avatar
Cloudera Employee

This article walks you through the setup of a default 7 node CDP OpDB Data Hub cluster for running Operational workloads in the Public Cloud.

 

In this article, I have used our internal Sandbox environment which was on CDP 7.0.2 release, in which the users are already set up and authenticated via Okta SSO. In this article, I am using AWS for Public Cloud. You can use Microsoft Azure as well. Support for Google Cloud Platform is currently targeted for later this year. 

 

  1. Navigate to Okta, or your SSO provider home page, if enabled. Search for CDP Public Cloud tile:1.png
    Or, click directly on the CDP Public Cloud tile
    2.png
  2. Click Data Hub Clusters in the portal:3.png
  3. Click Create Data Hub: 4.png

     

  4. Select the following options to create OpDB DataHub Cluster on AWS in the Sandbox environment. Feel free to enter your own Cluster Name under General Settings as shown below:
    5.png

     

    Also, review the Advanced Options. You need not update anything under it, for this exercise. Note the Services as part of this template. There is no Phoenix to work with yet in Public Cloud Sandbox Environment.
  5. Click Provision Cluster. This might take a few minutes to complete.
  6. While this is provisioning, you can explore the Data Lake and Summary tabs for interesting details about this environment:
    Note: Some of the details have been masked in the screenshot above.

    6.png7.png

  7. Once provisioned, the status should be Running as shown below:

    8.png
  8. Click on your Cluster Name to view the event history and review the additional details under each tab at the bottom: For example, the hardware tab lists the 7 nodes in detail.

    9.png
  9. Navigate to CM UI, HBase UI, and Name Node to explore additional details by clicking on the links under the Services section. For example, CM UI installed on the gateway node:
    10.png
    If you are able to access CM UI, please proceed to Step 10.
    If you are unable to access CM UI, there might be access issues specific to your login on this environment or Public Cloud Data Lake named “sandbox-default-se”. Navigate to the environment in the left pane and filter down to a specific environment as shown below:11.png
    Click on the environment name “sandbox-default-se”. Note that some of the details are masked in the screenshot below.12.png
    Click on Actions | Manage Access. Filter on your name as shown below:13.png


    Select the appropriate role in this screen. I am assuming an Admin for this environment and click on Update Roles as shown below:13.png
    Navigate back to the environment sandbox-default-se. Note that some of the details are masked in the screenshot below:

    14.png 

    Click on Actions | Synchronize Users to FreeIPA. This updates the access rights for all users in this Cloud Data Lake. Note that users can have different access to each Cloud Data Lake that is registered in this CDP Public Cloud environment.
    15.png

     

     16.png

     

  10. This might take a few minutes to complete.

    17.png

     

     

  11. Once completed successfully, you should see the following message:

    18.png
    Try accessing the CM UI again. If you still have any issues accessing CM, submit a Support ticket with Cloudera Team.
  12. Now that you are able to access CM UI, login to the Gateway node using CLI as shown below:
    Srikanths-MacBook-Pro:~ snadukudy$ ssh sritestnkudy-gateway2.sandbox.a465-9q4k.cloudera.site
    Password: (enter your workload password when prompted)
    
    Last login: Sun Mar  8 01:20:01 2020 from 76.126.250.37
    
         | |               | |               
    
      ___| | ___  _ _ __| | ___ _ __ __ _ 
    
     / __| |/ _ \| | | |/ _` |/ _ \ '__/ _` |
    
    | (__| | (_) | |_| | (_| |  __/ | | (_| |
    
     \___|_|\___/ \__,_|\__,_|\___|_|  \__,_|
    
    =================================================
    
      *                        : 
    
    [snadukudy@sritestnkudy-gateway2 ~]$​

    If you need to reset your workload password, please navigate to:19.png

    And then click on Set Workload Password to reset it. Note that some of the details are masked in the screenshot below. 20.png
  13. Open HBase Shell:

    [snadukudy@sritestnkudy-gateway2 ~]$ hbase shellHBase Shell
    Use "help" to get list of supported commands.
    Use "exit" to quit this interactive shell.
    For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
    Version 2.2.0.7.0.2.2-51, rUnknown, Thu Feb 13 19:38:56 UTC 2020
    Took 0.0009 seconds                                                                               
    hbase(main):001:0> 
  14. Create HBase table named emp or you can run your own HBase commands now:

    hbase(main):021:0> create 'emp', 'rowkey','name', 'email','address'
    Created table emp
    Took 8.3149 seconds                                                                                                                                                       
    => Hbase::Table - emp
  15. Show the emp table details: 

    hbase(main):022:0> desc 'emp'
    
    Table emp is ENABLED                                                                                                                                                            
    emp                                                                                                                                                                                 
    COLUMN FAMILIES DESCRIPTION                                                                                                                                                         
    {NAME => 'address', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_EN
    
    CODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRIT
    
    E => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                
    
    
    
    {NAME => 'email', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCO
    
    DING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE 
    
    => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                  
    
    {NAME => 'name', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCOD
    
    ING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE =
    
    > 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                   
    
    {NAME => 'rowkey', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENC
    
    ODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE
    
     => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                 
    4 row(s)
    
    QUOTAS                                                                                                                                                                              
    
    0 row(s)
    
    Took 0.0440 seconds   
  16. Insert rows into emp table by executing these commands:

    put 'emp', 'rowkey:1', 'name','rich'
    
    put 'emp', 'rowkey:1', 'email','rich1111@cloudera.com'
    
    put 'emp', 'rowkey:1', 'address','395 Page Mill Rd'
    
    
    
    put 'emp', 'rowkey:2', 'name',sri
    
    put 'emp', 'rowkey:2', 'email','sri1111@cloudera.com'
    
    put 'emp', 'rowkey:2', 'address','395 Page Mill Rd'
    
    
    
    put 'emp', 'rowkey:3', 'name',brandon
    
    put 'emp', 'rowkey:3', 'email','brandon1111@cloudera.com'
    
    put 'emp', 'rowkey:3', 'address','395 Page Mill Rd'
  17. View the emp table details:

    hbase(main):079:0> scan 'emp'
    
    ROW                                    COLUMN+CELL                                                                                                  
    
     rowkey:1                              column=address:, timestamp=1583635789786, value=395 Page Mill Rd                                             
    
     rowkey:1                              column=email:, timestamp=1583635860821, value=rich1111@cloudera.com                                          
    
     rowkey:1                              column=name:, timestamp=1583635732346, value=rich                                                            
    
     rowkey:2                              column=address:, timestamp=1583635521804, value=395 Page Mill Rd                                             
    
     rowkey:2                              column=email:, timestamp=1583635402242, value=sri111@cloudera.com                                            
    
     rowkey:2                              column=name:, timestamp=1583634956908, value=sri                                                             
    
     rowkey:3                              column=address:, timestamp=1583635819282, value=395 Page Mill Rd                                             
    
     rowkey:3                              column=email:, timestamp=1583638851663, value=brandon1111@cloudera.com                                       
    
     rowkey:3                              column=name:, timestamp=1583635898253, value=brandon                                                         
    
    3 row(s)
    
    Took 0.0088 seconds 
  18. Disable emp table first to drop the test data:

    hbase(main):080:0> disable 'emp'
    
    Took 4.3165 seconds 
  19. Drop emp table to delete the test data and see if there are any other HBase tables using the list command:

    hbase(main):081:0> drop 'emp'
    
    Took 8.2271 seconds                                                                                                                                 
    
    
    
    hbase(main):082:0> list
    
    TABLE                                                                                                                                               
    
    0 row(s)
    
    Took 0.0019 seconds                                                                                                                                 
    
    => []
    
    hbase(main):083:0> 
  20. Now, exit HBase Shell using the following command:

    hbase(main):083:0> exit
    
    20/03/08 03:44:01 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
    
    20/03/08 03:44:01 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
    
    20/03/08 03:44:01 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
    
    [snadukudy@sritestnkudy-gateway2 ~]$
  21. Logout of the gateway node:

    [snadukudy@sritestnkudy-gateway2 ~]$ exit 
    
    logout
    
    Connection to sritestnkudy-gateway2.sandbox.a465-9q4k.cloudera.site closed.
    
    Srikanths-MacBook-Pro:~ snadukudy$
  22. Terminate DataHub Cluster to free up the resources by going under Actions:21.png

  23. Click on Terminate. This generates the following prompt:21.png

  24. Click on Yes to proceed. 

    22.png

 

This might take a few minutes to complete and this DataHub Cluster will be no longer listed on the page. This is the end of the sneak peek into OpDB Data Hub Experience in CDP. 

The following is the screenshot for the services included (with the support for Apache Phoenix in CDP DataHub 7.1 released on March 2nd, 2020).  23.png
OpDB Data Hub On Azure:24.png

 

 

 

 

 

 

 

 

This is my first community article and hope you enjoy playing with it as I did. Special thanks to Krishna Maheshwari, Josiah Goodson and Brandon Freeman for their guidance as well! Looking forward to your feedback.

Have a great one! 

Cheers,
Sri

2,159 Views