About willx

willx · ‎09-29-2021

Hi @Visvanath_JP, The question could be more specific like what hadoop versions are two clusters, are both clusters secured, are they CDH/CDP or HDP. Do you only migrate data in HDFS layer or other layer, for example hive / hbase / kudu. The most common way is using distcp to migrate data between hdfs clusters. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/scaling-namespaces/topics/hdfs-distcp-to-copy-files.html If you are using CDH/CDP, BDR job is another choice (distcp integrated) https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/replication-manager/topics/rm-dc-hdfs-replication.html Distcp guide: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#:~:text=DistCp%20(distributed%20copy)%20is%20a,specified%20in%20the%20source%20list. Regards, Will If the answer helps, please accept as solution and click thumbs up.

willx · ‎09-24-2021

Hello @Clua , Looks like you solved it, if possible could you please share the code snippets how you added gssflags in the authGSSClientInit and which Transport function are you using. Thanks, Will

willx · ‎09-24-2021

Hi @drgenious, 1) where can I run these kind of queries? In CM -> Charts -> Chart Builder builder you can run tsquery. Refer to this link: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_dg_chart_time_series_data.html 2) where can I find the attributes like category and clusterName in cloudera? In Chart Builder text bar, write an incomplete query like: SELECT get_file_info_rate Below the text bar there is Facets, click on More, select any Facets you want, for example you select clusterName, then you will see a the clusterName shows in the chart's title. Then you can complete your tsquery: SELECT get_file_info_rate where clusterName=xxxxx If you want to build impala related charts, suggest to firstly review the CM > Impala service > Charts Library, many charts are already there for common monitoring purpose. You can open any of the existing charts to learn how to construct the tsquery and then build your own charts. Another very good place to learn is CM > Charts > Chart Builder, at right side you will see a "?" button, click on it you will see many examples and you could just cllick "try it". Regards, Will If the answer helps, please accept as solution and click thumbs up.

willx · ‎09-22-2021

Hi @doncucumber , Now you can have a good rest 🙂 Please check this article for the detailed HA concepts: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_hag_hdfs_ha_intro.html Active NN will sync up edit logs to majority of JournalNodes, so standby NN is capable of reading the edits from JNs. From your NN log we could see the reason of recovery failure is the recovery time exceeds the timeout 120000ms for a quorum of nodes to respond, so that's why I requested JN logs to check if edits sync-up failed and we found JN2 & 3's problem. Regards, Will

willx · ‎09-22-2021

Hi @doncucumber, From these JN logs we can say edits_inprogress_0000000000010993186 was not the latest edit log, in JN1 the edit_inprogress number is13015981 but the problematic JN2 and JN3 is still 10993186. You could try following steps: 1. stop whole HDFS service including all NN/JN. 2. On JN2 /JN3 which both has same error, move the fsimage edits directory (/datos3/dfs/jn/nameservice1/current/) to another location, for example, move them to /tmp 3. Copy the good fsimage edits directory (/datos3/dfs/jn/nameservice1/current/) from JN1 to these problematic JN nodes. Now you have manually synced up all the JNs fsimage edits directories. 4. Start HDFS. Please let us know if this solution could help. Thanks, Will

willx · ‎09-22-2021

Hi @doncucumber , Could you please update what is the error in JN logs at the same time? Thanks, Will

willx · ‎09-20-2021

Hi @Tamiri, I think @Shelton has already answered in another post: https://community.cloudera.com/t5/Support-Questions/Hortonworks-HDP-3-0-root-user-password-doesn-t-work/m-p/286034# https://www.cloudera.com/tutorials/learning-the-ropes-of-the-hdp-sandbox.html Please check if it helps? Thanks, Will

willx · ‎09-19-2021

Introduction Thrift proxy is a modern micro-service framework when comparing to other existing frameworks such as SOAP/JSON-RPC/Rest proxy. The Thrift proxy API has a higher performance, is more scalable, and is multi-language supported. (C++, Java, Python, PHP, Ruby, Perl, C#, Objective-C, JavaScript, NodeJs, and other languages). The application can interact with HBase via Thrift proxy. This article will discuss how to use correct libraries and methods to interact with HBase via Thrift proxy. Outline The basic concept of Thrift proxy and how the thrift language bindings are generated. How Python thrift functions align with the correct settings of HBase configurations from Cloudera Manager. Sample client codes in security disabled/ enabled HBase clusters. Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions. The basic concept of Thrift proxy and how the Thrift cross-language bindings are generated The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings. A Thrift binding is a client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system. Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift The IDL file named Hbase.thrift is in CDP parcels. find / -name "Hbase.thrift" IDL compiler will be installed by following the steps in Building Apache Thrift on CentOS 6.5. Follow this article to generate Python library bindings (Server stubs). Now, you should be able to import Python libraries into your client code. How Python functions align with the HBase Configurations from Cloudera Manager In many examples, you will see several functions to interact with thrift. The concepts of Transport, socket, protocol are described in the book Programmer’s Guide to Apache Thrift. Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift We will discuss how the functions work with HBase configurations. These parameters are taken into consideration: Is SSL enabled? (search “SSL” in CM > HBase configuration, usually auto-enabled by CM) Use SSLSocket, otherwise use socket hbase.thrift.security.qop=auth-conf ? This means Kerberos is enabled. Use TSaslClientTransport hbase.regionserver.thrift.compact=true? Use TCompactProtocol, otherwise use TBinaryProtocol hbase.regionserver.thrift.framed=true? Use TFramedTransport otherwise use TBufferedTransport hbase.regionserver.thrift.http=true and hbase.thrift.support.proxyuser=true? means DoAs implementation is required. The http mode cannot co-exist with Frame mode. Use THttpClient Sample client codes in security disabled/ enabled HBase clusters Kerberos enabled / SSL disabled: Settings: SSL disabled hbase.thrift.security.qop=auth-conf hbase.regionserver.thrift.compact = false hbase.regionserver.thrift.framed=false hbase.regionserver.thrift.http=false hbase.thrift.support.proxyuser=false from thrift.transport import TSocket from thrift.protocol import TBinaryProtocol from thrift.transport import TTransport from hbase import Hbase import kerberos import sasl from subprocess import call thrift_host=<thrift host> thrift_port=9090 # call kinit commands to get the kerberos ticket. krb_service='hbase' principal='hbase/<host>' keytab="/path/to/hbase.keytab" kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+principal call(kinitCommand,shell="True") socket = TSocket.TSocket(thrift_host, thrift_port) transport = TTransport.TSaslClientTransport(socket,host=thrift_host,service='hbase',mechanism='GSSAPI') protocol = TBinaryProtocol.TBinaryProtocol(transport) transport.open() client = Hbase.Client(protocol) print(client.getTableNames()) transport.close() This works in CDH 6, but does not work in some CDP versions due to a known bug described in the next section. Kerberos enabled /SSL enabled: Settings: SSL enabled hbase.thrift.security.qop=auth-conf hbase.regionserver.thrift.compact = false hbase.regionserver.thrift.framed=false hbase.regionserver.thrift.http=true hbase.thrift.support.proxyuser=true The following code is changed and tested based on @manjilhk 's post here. from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol from hbase.Hbase import Client from subprocess import call import ssl import kerberos def kerberos_auth(): call("kdestroy",shell="True") clientPrincipal='hbase@<DOMAIN.COM>' # hbase client keytab is copied from /keytabs/hbase.keytab # you can find the location using “find” keytab="/path/to/hbase.keytab" kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+clientPrincipal call(kinitCommand,shell="True") # this is the hbase service principal of HTTP, check with # klist -kt /var/run/cloudera-scm-agent/process/<latest-thrift-process>/hbase.keytab hbaseService="HTTP/<host>@<DOMAIN.COM>" __, krb_context = kerberos.authGSSClientInit(hbaseService) kerberos.authGSSClientStep(krb_context, "") negotiate_details = kerberos.authGSSClientResponse(krb_context) headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'} return headers #cert_file is copied from CDP, use “find” to get the location, scp to your app server. httpClient = THttpClient.THttpClient('https://< thrift server fqdn>:9090/', cert_file='/root/certs/localhost.crt',key_file='/root/certs/localhost.key', ssl_context=ssl._create_unverified_context()) # if no ssl verification is required httpClient.setCustomHeaders(headers=kerberos_auth()) protocol = TBinaryProtocol.TBinaryProtocol(httpClient) httpClient.open() client = Client(protocol) tables=client.getTableNames() print(tables) httpClient.close() Nowadays, security (SSL/Kerberos) is very important when applications interact with databases. And many popular services like Knox and Hue are interacting with HBase via Thrift server over HTTP client. So, we recommend using the second method. Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions Upstream Jira HBASE-21652 where a bug is introduced related to Kerberos principal handling. When refactoring the Thrift server, making thrift2 server inherit from thrift1 server, ThriftServerRunner ThriftServer is merged and the principal switching step was omitted. Before the refactoring, everything is run in a doAs() block in ThriftServerRunner.run(). References Programmer’s Guide to Apache Thrift Python3 connection to Kerberos Hbase thrift HTTPS Use the Apache Thrift Proxy API How-to: Use the HBase Thrift Interface, Part 1 How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows Disclaimer This article did not test all the versions; both methods are tested in Python 2.7.5 and Python 3.6.8. Change the code according to your need, if encounter an issue. Posting questions to the Community and raising cases with Cloudera support are recommended.

willx · ‎09-15-2021

Hi @Ellyly , Here is the example. (1). Firstly, list -R and grep "^d" to show all the subdirectories in your path: # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" drwxr-xr-x - hdfs supergroup 0 2021-09-15 14:48 /folder1/folder2 drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3 drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3/folder4 drwxr-xr-x - hdfs supergroup 0 2021-09-11 05:09 /folder1/subfolder1 (2). Then, awk -F\/ '{print NF-1}' to calculate each directory's depth, actually we print number of fields separated by /. After -F it is \ and /, no space in between, it is not character"V" !!! 🙂 # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}' 2 3 4 2 (3). Finally, sort and head # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'|sort -rn|head -1 4 Regards, Will If the answer helps, please accept as solution and click thumbs up.

willx · ‎09-12-2021

Introduction Phoenix is a popular solution to provide OLTP and operational analytics on top of HBase for low latency. Hortonworks Data Platform (HDP), Cloudera Data Platform (CDP) are the most popular platforms for Phoenix to interact with HBase. Nowadays, many customers choose to migrate to Cloudera Data Platform to better manage their Hadoop clusters and implement the latest solutions in big data. This article discussed how to migrate Phoenix data/index tables to the newer version CDP Private Cloud Base. Environment Source cluster HDP 2.6.5 , HDP 3.1.5 Target cluster CDP PvC 7.1.5, CDP PvC 7.1.6, CDP PvC 7.1.7 Migration steps The SYSTEM table will be automatically created when Phoenix-sqlline initially starts. It will contain the metadata of Phoenix tables. In order to show Phoenix data/index tables in the target cluster, we need to migrate SYSTEM tables from the source cluster as well. Stop Phoenix service on the CDP cluster You can stop the service on Cloudera Manager > Services > Phoenix Service > Stop Drop the system.% tables on CDP cluster (from HBase) In HBase shell, drop all the SYSTEM tables. hbase:006:0> disable_all "SYSTEM.*" hbase:006:0> drop_all "SYSTEM.*" Copy the system, data, and index tables to the CDP cluster There are many methods to copy data between HBase clusters. I would recommend using snapshots to keep the schema same. Source HBase: Take snapshots of all SYSTEM tables and data tables hbase(main):020:0> snapshot "SYSTEM.CATALOG","CATALOG_snap" ExportSnapshot to the target cluster sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot CATALOG_snap -copy-to hdfs://Target_Active_NameNode:8020/hbase -mappers 16 -bandwidth 200 Your HBase directory path may be different. Check HBase configuration in Cloudera Manager for the path. In the Target cluster, the owner may become a different user who triggers MapReduce. So, we need to change the owner back to default hbase:hbase sudo -u hdfs hdfs dfs -chown -R hbase:hbase /hbase In HBase shell, use clone_snapshot to create new tables clone_snapshot "CATALOG_snap","SYSTEM.CATALOG" When you complete the above steps, you should have all the SYSTEM tables and data tables, and index tables in your target HBase. For example, the following is copied from HDP2.6.5 cluster and created in CDP. hbase:013:0> list TABLE SYSTEM.CATALOG SYSTEM.FUNCTION SYSTEM.SEQUENCE SYSTEM.STATS TEST Start Phoenix service, enter phoenix-sqlline, and then check if you can query the table. (Optional) If HDP already enabled NamespaceMapping, we should also set isNamespaceMappingEnabled to true on the CDP cluster in both client/service hbase-site.xml, and restart the Phoenix service. Known Bug of Migration Process Starting from Phoenix 5.1.0/ CDP 7.1.6, there is a bug during SYSTEM tables auto-upgrade. The fix will be included in the future CDP release. The customer should raise cases with Cloudera support and apply a hotfix for this bug on top of CDP 7.1.6/ 7.1.7. Refer to PHOENIX-6534 Disclaimer This article does not contain all the versions of HDP and CDP, and also does not test all the situations. It only chooses the popular or latest versions. If you followed steps but failed or met with a new issue, please feel free to ask in the Community or raise a case with Cloudera support.

Online	Offline
Last Visited	‎03-11-2025 01:38 AM

Member Since	‎10-03-2020 06:12 AM
Last Visited	‎03-11-2025 01:38 AM
Posts	236
Kudos received	14

Cloudera Community

Re: Datanode and Impala Daemon Instances Show Unkn...

Re: Services not starting up after Enabling Kerber...

Re: What is the difference between volumes and fol...

Re: Hbase labels table creation

Re: All Hdfs file names older than N days

Re: HDFS Data migration from one data center to ot...

Re: CDP connection with Python/Thrift failing

Re: Where can I run tsquery ?

Re: NameNode won't start because can't connect Jou...

Re: NameNode won't start because can't connect Jou...

Re: NameNode won't start because can't connect Jou...

Re: HDP 3.0.1 Sandbox on VirtualBox password admi...

Python interaction with HBase Thrift proxy in Kerb...

Re: HDFS command find argument type

Phoenix tables migration from HDP to CDP