Member since
10-03-2020
235
Posts
15
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1222 | 08-28-2023 02:13 AM | |
1754 | 12-15-2021 05:26 PM | |
1636 | 10-22-2021 10:09 AM | |
4605 | 10-20-2021 08:44 AM | |
4620 | 10-20-2021 01:01 AM |
09-24-2021
10:09 PM
Hello @Clua , Looks like you solved it, if possible could you please share the code snippets how you added gssflags in the authGSSClientInit and which Transport function are you using. Thanks, Will
... View more
09-24-2021
08:15 PM
1 Kudo
Hi @drgenious, 1) where can I run these kind of queries? In CM -> Charts -> Chart Builder builder you can run tsquery. Refer to this link: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_dg_chart_time_series_data.html 2) where can I find the attributes like category and clusterName in cloudera? In Chart Builder text bar, write an incomplete query like: SELECT get_file_info_rate Below the text bar there is Facets, click on More, select any Facets you want, for example you select clusterName, then you will see a the clusterName shows in the chart's title. Then you can complete your tsquery: SELECT get_file_info_rate where clusterName=xxxxx If you want to build impala related charts, suggest to firstly review the CM > Impala service > Charts Library, many charts are already there for common monitoring purpose. You can open any of the existing charts to learn how to construct the tsquery and then build your own charts. Another very good place to learn is CM > Charts > Chart Builder, at right side you will see a "?" button, click on it you will see many examples and you could just cllick "try it". Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more
09-22-2021
06:30 AM
1 Kudo
Hi @doncucumber , Now you can have a good rest 🙂 Please check this article for the detailed HA concepts: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_hag_hdfs_ha_intro.html Active NN will sync up edit logs to majority of JournalNodes, so standby NN is capable of reading the edits from JNs. From your NN log we could see the reason of recovery failure is the recovery time exceeds the timeout 120000ms for a quorum of nodes to respond, so that's why I requested JN logs to check if edits sync-up failed and we found JN2 & 3's problem. Regards, Will
... View more
09-22-2021
05:28 AM
Hi @doncucumber, From these JN logs we can say edits_inprogress_0000000000010993186 was not the latest edit log, in JN1 the edit_inprogress number is13015981 but the problematic JN2 and JN3 is still 10993186. You could try following steps: 1. stop whole HDFS service including all NN/JN. 2. On JN2 /JN3 which both has same error, move the fsimage edits directory (/datos3/dfs/jn/nameservice1/current/) to another location, for example, move them to /tmp 3. Copy the good fsimage edits directory (/datos3/dfs/jn/nameservice1/current/) from JN1 to these problematic JN nodes. Now you have manually synced up all the JNs fsimage edits directories. 4. Start HDFS. Please let us know if this solution could help. Thanks, Will
... View more
09-22-2021
04:42 AM
Hi @doncucumber , Could you please update what is the error in JN logs at the same time? Thanks, Will
... View more
09-20-2021
01:30 AM
Hi @Tamiri, I think @Shelton has already answered in another post: https://community.cloudera.com/t5/Support-Questions/Hortonworks-HDP-3-0-root-user-password-doesn-t-work/m-p/286034# https://www.cloudera.com/tutorials/learning-the-ropes-of-the-hdp-sandbox.html Please check if it helps? Thanks, Will
... View more
09-19-2021
08:38 PM
1 Kudo
Introduction
Thrift proxy is a modern micro-service framework when comparing to other existing frameworks such as SOAP/JSON-RPC/Rest proxy. The Thrift proxy API has a higher performance, is more scalable, and is multi-language supported. (C++, Java, Python, PHP, Ruby, Perl, C#, Objective-C, JavaScript, NodeJs, and other languages).
The application can interact with HBase via Thrift proxy.
This article will discuss how to use correct libraries and methods to interact with HBase via Thrift proxy.
Outline
The basic concept of Thrift proxy and how the thrift language bindings are generated.
How Python thrift functions align with the correct settings of HBase configurations from Cloudera Manager.
Sample client codes in security disabled/ enabled HBase clusters.
Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions.
The basic concept of Thrift proxy and how the Thrift cross-language bindings are generated
The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings. A Thrift binding is a client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.
Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift
The IDL file named Hbase.thrift is in CDP parcels.
find / -name "Hbase.thrift"
IDL compiler will be installed by following the steps in Building Apache Thrift on CentOS 6.5.
Follow this article to generate Python library bindings (Server stubs). Now, you should be able to import Python libraries into your client code.
How Python functions align with the HBase Configurations from Cloudera Manager
In many examples, you will see several functions to interact with thrift. The concepts of Transport, socket, protocol are described in the book Programmer’s Guide to Apache Thrift.
Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift
We will discuss how the functions work with HBase configurations.
These parameters are taken into consideration:
Is SSL enabled? (search “SSL” in CM > HBase configuration, usually auto-enabled by CM)
Use SSLSocket, otherwise use socket
hbase.thrift.security.qop=auth-conf ? This means Kerberos is enabled.
Use TSaslClientTransport
hbase.regionserver.thrift.compact=true?
Use TCompactProtocol, otherwise use TBinaryProtocol
hbase.regionserver.thrift.framed=true?
Use TFramedTransport otherwise use TBufferedTransport
hbase.regionserver.thrift.http=true and hbase.thrift.support.proxyuser=true?
means DoAs implementation is required. The http mode cannot co-exist with Frame mode. Use THttpClient
Sample client codes in security disabled/ enabled HBase clusters
Kerberos enabled / SSL disabled:
Settings:
SSL disabled
hbase.thrift.security.qop=auth-conf
hbase.regionserver.thrift.compact = false
hbase.regionserver.thrift.framed=false
hbase.regionserver.thrift.http=false
hbase.thrift.support.proxyuser=false
from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase
import kerberos
import sasl
from subprocess import call
thrift_host=<thrift host>
thrift_port=9090
# call kinit commands to get the kerberos ticket.
krb_service='hbase'
principal='hbase/<host>'
keytab="/path/to/hbase.keytab"
kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+principal
call(kinitCommand,shell="True")
socket = TSocket.TSocket(thrift_host, thrift_port)
transport = TTransport.TSaslClientTransport(socket,host=thrift_host,service='hbase',mechanism='GSSAPI')
protocol = TBinaryProtocol.TBinaryProtocol(transport)
transport.open()
client = Hbase.Client(protocol)
print(client.getTableNames())
transport.close()
This works in CDH 6, but does not work in some CDP versions due to a known bug described in the next section.
Kerberos enabled /SSL enabled:
Settings:
SSL enabled
hbase.thrift.security.qop=auth-conf
hbase.regionserver.thrift.compact = false
hbase.regionserver.thrift.framed=false
hbase.regionserver.thrift.http=true
hbase.thrift.support.proxyuser=true
The following code is changed and tested based on @manjilhk 's post here.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from subprocess import call
import ssl
import kerberos
def kerberos_auth():
call("kdestroy",shell="True")
clientPrincipal='hbase@<DOMAIN.COM>'
# hbase client keytab is copied from /keytabs/hbase.keytab
# you can find the location using “find”
keytab="/path/to/hbase.keytab"
kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+clientPrincipal
call(kinitCommand,shell="True")
# this is the hbase service principal of HTTP, check with
# klist -kt /var/run/cloudera-scm-agent/process/<latest-thrift-process>/hbase.keytab
hbaseService="HTTP/<host>@<DOMAIN.COM>"
__, krb_context = kerberos.authGSSClientInit(hbaseService)
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'}
return headers
#cert_file is copied from CDP, use “find” to get the location, scp to your app server.
httpClient = THttpClient.THttpClient('https://< thrift server fqdn>:9090/', cert_file='/root/certs/localhost.crt',key_file='/root/certs/localhost.key', ssl_context=ssl._create_unverified_context())
# if no ssl verification is required
httpClient.setCustomHeaders(headers=kerberos_auth())
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
httpClient.open()
client = Client(protocol)
tables=client.getTableNames()
print(tables)
httpClient.close()
Nowadays, security (SSL/Kerberos) is very important when applications interact with databases. And many popular services like Knox and Hue are interacting with HBase via Thrift server over HTTP client. So, we recommend using the second method.
Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions
Upstream Jira HBASE-21652 where a bug is introduced related to Kerberos principal handling.
When refactoring the Thrift server, making thrift2 server inherit from thrift1 server, ThriftServerRunner ThriftServer is merged and the principal switching step was omitted.
Before the refactoring, everything is run in a doAs() block in ThriftServerRunner.run().
References
Programmer’s Guide to Apache Thrift
Python3 connection to Kerberos Hbase thrift HTTPS
Use the Apache Thrift Proxy API
How-to: Use the HBase Thrift Interface, Part 1
How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows
Disclaimer
This article did not test all the versions; both methods are tested in Python 2.7.5 and Python 3.6.8.
Change the code according to your need, if encounter an issue. Posting questions to the Community and raising cases with Cloudera support are recommended.
... View more
09-15-2021
08:23 AM
1 Kudo
Hi @Ellyly , Here is the example. (1). Firstly, list -R and grep "^d" to show all the subdirectories in your path: # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" drwxr-xr-x - hdfs supergroup 0 2021-09-15 14:48 /folder1/folder2 drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3 drwxr-xr-x - hdfs supergroup 0 2021-09-15 15:01 /folder1/folder2/folder3/folder4 drwxr-xr-x - hdfs supergroup 0 2021-09-11 05:09 /folder1/subfolder1 (2). Then, awk -F\/ '{print NF-1}' to calculate each directory's depth, actually we print number of fields separated by /. After -F it is \ and /, no space in between, it is not character"V" !!! 🙂 # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}' 2 3 4 2 (3). Finally, sort and head # sudo -u hdfs hdfs dfs -ls -R /folder1/ | grep "^d" | awk -F\/ '{print NF-1}'|sort -rn|head -1 4 Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more
09-12-2021
10:59 PM
1 Kudo
Introduction
Phoenix is a popular solution to provide OLTP and operational analytics on top of HBase for low latency. Hortonworks Data Platform (HDP), Cloudera Data Platform (CDP) are the most popular platforms for Phoenix to interact with HBase.
Nowadays, many customers choose to migrate to Cloudera Data Platform to better manage their Hadoop clusters and implement the latest solutions in big data.
This article discussed how to migrate Phoenix data/index tables to the newer version CDP Private Cloud Base.
Environment
Source cluster HDP 2.6.5 , HDP 3.1.5
Target cluster CDP PvC 7.1.5, CDP PvC 7.1.6, CDP PvC 7.1.7
Migration steps
The SYSTEM table will be automatically created when Phoenix-sqlline initially starts. It will contain the metadata of Phoenix tables. In order to show Phoenix data/index tables in the target cluster, we need to migrate SYSTEM tables from the source cluster as well.
Stop Phoenix service on the CDP cluster You can stop the service on Cloudera Manager > Services > Phoenix Service > Stop
Drop the system.% tables on CDP cluster (from HBase) In HBase shell, drop all the SYSTEM tables. hbase:006:0> disable_all "SYSTEM.*"
hbase:006:0> drop_all "SYSTEM.*"
Copy the system, data, and index tables to the CDP cluster There are many methods to copy data between HBase clusters. I would recommend using snapshots to keep the schema same. Source HBase:
Take snapshots of all SYSTEM tables and data tables hbase(main):020:0> snapshot "SYSTEM.CATALOG","CATALOG_snap"
ExportSnapshot to the target cluster sudo -u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot CATALOG_snap -copy-to hdfs://Target_Active_NameNode:8020/hbase -mappers 16 -bandwidth 200 Your HBase directory path may be different. Check HBase configuration in Cloudera Manager for the path.
In the Target cluster, the owner may become a different user who triggers MapReduce. So, we need to change the owner back to default hbase:hbase sudo -u hdfs hdfs dfs -chown -R hbase:hbase /hbase
In HBase shell, use clone_snapshot to create new tables clone_snapshot "CATALOG_snap","SYSTEM.CATALOG" When you complete the above steps, you should have all the SYSTEM tables and data tables, and index tables in your target HBase. For example, the following is copied from HDP2.6.5 cluster and created in CDP. hbase:013:0> list
TABLE
SYSTEM.CATALOG
SYSTEM.FUNCTION
SYSTEM.SEQUENCE
SYSTEM.STATS
TEST
Start Phoenix service, enter phoenix-sqlline, and then check if you can query the table.
(Optional) If HDP already enabled NamespaceMapping, we should also set isNamespaceMappingEnabled to true on the CDP cluster in both client/service hbase-site.xml, and restart the Phoenix service.
Known Bug of Migration Process
Starting from Phoenix 5.1.0/ CDP 7.1.6, there is a bug during SYSTEM tables auto-upgrade. The fix will be included in the future CDP release. The customer should raise cases with Cloudera support and apply a hotfix for this bug on top of CDP 7.1.6/ 7.1.7.
Refer to PHOENIX-6534
Disclaimer
This article does not contain all the versions of HDP and CDP, and also does not test all the situations. It only chooses the popular or latest versions. If you followed steps but failed or met with a new issue, please feel free to ask in the Community or raise a case with Cloudera support.
... View more
09-11-2021
10:00 PM
1 Kudo
Hi @DanHosier, Just provide you a possible solution to bind the namenode http to localhost. Add following property to service side advanced hdfs-site.xml and restart hdfs. HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml <property> <name>dfs.namenode.http-bind-host</name> <value>127.0.0.1</value> </property> Then the property is added into /var/run/cloudera-scm-agent/process/<Latest process of NN>/hdfs-site.xml: # grep -C2 "dfs.namenode.http-bind-host" hdfs-site.xml </property> <property> <name>dfs.namenode.http-bind-host</name> <value>127.0.0.1</value> </property> And then test curl commands: # curl `hostname -f`:9870 curl: (7) Failed connect to xxxx.xxxx.xxxx.com:9870; Connection refused # curl localhost:9870 <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="REFRESH" content="0;url=dfshealth.html" /> <title>Hadoop Administration</title> </head> </html> Now the webUI only served on NN's localhost. But you will see this alert on CM because Service Monitor cannot reach to NN WebUI: NameNode summary: xxxx.xxxx.xxxx.com (Availability: Unknown, Health: Bad). This health test is bad because the Service Monitor did not find an active NameNode. So this solution has side effects for service monitor, but actually hdfs is running well. Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more