Member since
10-03-2020
235
Posts
15
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
482 | 11-11-2024 09:31 AM | |
1338 | 08-28-2023 02:13 AM | |
1874 | 12-15-2021 05:26 PM | |
1714 | 10-22-2021 10:09 AM | |
4851 | 10-20-2021 08:44 AM |
10-21-2021
10:30 AM
@dzbeda, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
10-21-2021
12:49 AM
@DA-Ka You need to use HDFS Find tool "org.apache.solr.hadoop.HdfsFindTool" for that purpose. Refer below links which suggests some method to fid the old Files. - http://35.204.180.114/static/help/topics/search_hdfsfindtool.html However, the search-based HDFS find tool has been removed and is superseded in CDH 6 by the native "hdfs dfs -find" command, documented here: https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find
... View more
10-02-2021
04:19 AM
1 Kudo
@Tamiri , Please click on your avatar and check My settings > SUBSCRIPTIONS&NOTIFICATIONS Another place is when you reply to post, on the top right select "Email me when someone replies". Regards, Will
... View more
09-29-2021
09:50 AM
1 Kudo
Then above solutions meet your needs.
... View more
09-24-2021
10:09 PM
Hello @Clua , Looks like you solved it, if possible could you please share the code snippets how you added gssflags in the authGSSClientInit and which Transport function are you using. Thanks, Will
... View more
09-24-2021
08:15 PM
1 Kudo
Hi @drgenious, 1) where can I run these kind of queries? In CM -> Charts -> Chart Builder builder you can run tsquery. Refer to this link: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_dg_chart_time_series_data.html 2) where can I find the attributes like category and clusterName in cloudera? In Chart Builder text bar, write an incomplete query like: SELECT get_file_info_rate Below the text bar there is Facets, click on More, select any Facets you want, for example you select clusterName, then you will see a the clusterName shows in the chart's title. Then you can complete your tsquery: SELECT get_file_info_rate where clusterName=xxxxx If you want to build impala related charts, suggest to firstly review the CM > Impala service > Charts Library, many charts are already there for common monitoring purpose. You can open any of the existing charts to learn how to construct the tsquery and then build your own charts. Another very good place to learn is CM > Charts > Chart Builder, at right side you will see a "?" button, click on it you will see many examples and you could just cllick "try it". Regards, Will If the answer helps, please accept as solution and click thumbs up.
... View more
09-22-2021
06:30 AM
1 Kudo
Hi @doncucumber , Now you can have a good rest 🙂 Please check this article for the detailed HA concepts: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_hag_hdfs_ha_intro.html Active NN will sync up edit logs to majority of JournalNodes, so standby NN is capable of reading the edits from JNs. From your NN log we could see the reason of recovery failure is the recovery time exceeds the timeout 120000ms for a quorum of nodes to respond, so that's why I requested JN logs to check if edits sync-up failed and we found JN2 & 3's problem. Regards, Will
... View more
09-19-2021
08:38 PM
1 Kudo
Introduction
Thrift proxy is a modern micro-service framework when comparing to other existing frameworks such as SOAP/JSON-RPC/Rest proxy. The Thrift proxy API has a higher performance, is more scalable, and is multi-language supported. (C++, Java, Python, PHP, Ruby, Perl, C#, Objective-C, JavaScript, NodeJs, and other languages).
The application can interact with HBase via Thrift proxy.
This article will discuss how to use correct libraries and methods to interact with HBase via Thrift proxy.
Outline
The basic concept of Thrift proxy and how the thrift language bindings are generated.
How Python thrift functions align with the correct settings of HBase configurations from Cloudera Manager.
Sample client codes in security disabled/ enabled HBase clusters.
Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions.
The basic concept of Thrift proxy and how the Thrift cross-language bindings are generated
The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings. A Thrift binding is a client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.
Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift
The IDL file named Hbase.thrift is in CDP parcels.
find / -name "Hbase.thrift"
IDL compiler will be installed by following the steps in Building Apache Thrift on CentOS 6.5.
Follow this article to generate Python library bindings (Server stubs). Now, you should be able to import Python libraries into your client code.
How Python functions align with the HBase Configurations from Cloudera Manager
In many examples, you will see several functions to interact with thrift. The concepts of Transport, socket, protocol are described in the book Programmer’s Guide to Apache Thrift.
Image credits: The above figure is copied from Programmer’s Guide to Apache Thrift
We will discuss how the functions work with HBase configurations.
These parameters are taken into consideration:
Is SSL enabled? (search “SSL” in CM > HBase configuration, usually auto-enabled by CM)
Use SSLSocket, otherwise use socket
hbase.thrift.security.qop=auth-conf ? This means Kerberos is enabled.
Use TSaslClientTransport
hbase.regionserver.thrift.compact=true?
Use TCompactProtocol, otherwise use TBinaryProtocol
hbase.regionserver.thrift.framed=true?
Use TFramedTransport otherwise use TBufferedTransport
hbase.regionserver.thrift.http=true and hbase.thrift.support.proxyuser=true?
means DoAs implementation is required. The http mode cannot co-exist with Frame mode. Use THttpClient
Sample client codes in security disabled/ enabled HBase clusters
Kerberos enabled / SSL disabled:
Settings:
SSL disabled
hbase.thrift.security.qop=auth-conf
hbase.regionserver.thrift.compact = false
hbase.regionserver.thrift.framed=false
hbase.regionserver.thrift.http=false
hbase.thrift.support.proxyuser=false
from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase
import kerberos
import sasl
from subprocess import call
thrift_host=<thrift host>
thrift_port=9090
# call kinit commands to get the kerberos ticket.
krb_service='hbase'
principal='hbase/<host>'
keytab="/path/to/hbase.keytab"
kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+principal
call(kinitCommand,shell="True")
socket = TSocket.TSocket(thrift_host, thrift_port)
transport = TTransport.TSaslClientTransport(socket,host=thrift_host,service='hbase',mechanism='GSSAPI')
protocol = TBinaryProtocol.TBinaryProtocol(transport)
transport.open()
client = Hbase.Client(protocol)
print(client.getTableNames())
transport.close()
This works in CDH 6, but does not work in some CDP versions due to a known bug described in the next section.
Kerberos enabled /SSL enabled:
Settings:
SSL enabled
hbase.thrift.security.qop=auth-conf
hbase.regionserver.thrift.compact = false
hbase.regionserver.thrift.framed=false
hbase.regionserver.thrift.http=true
hbase.thrift.support.proxyuser=true
The following code is changed and tested based on @manjilhk 's post here.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from subprocess import call
import ssl
import kerberos
def kerberos_auth():
call("kdestroy",shell="True")
clientPrincipal='hbase@<DOMAIN.COM>'
# hbase client keytab is copied from /keytabs/hbase.keytab
# you can find the location using “find”
keytab="/path/to/hbase.keytab"
kinitCommand="kinit"+" "+"-kt"+" "+keytab+" "+clientPrincipal
call(kinitCommand,shell="True")
# this is the hbase service principal of HTTP, check with
# klist -kt /var/run/cloudera-scm-agent/process/<latest-thrift-process>/hbase.keytab
hbaseService="HTTP/<host>@<DOMAIN.COM>"
__, krb_context = kerberos.authGSSClientInit(hbaseService)
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'}
return headers
#cert_file is copied from CDP, use “find” to get the location, scp to your app server.
httpClient = THttpClient.THttpClient('https://< thrift server fqdn>:9090/', cert_file='/root/certs/localhost.crt',key_file='/root/certs/localhost.key', ssl_context=ssl._create_unverified_context())
# if no ssl verification is required
httpClient.setCustomHeaders(headers=kerberos_auth())
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
httpClient.open()
client = Client(protocol)
tables=client.getTableNames()
print(tables)
httpClient.close()
Nowadays, security (SSL/Kerberos) is very important when applications interact with databases. And many popular services like Knox and Hue are interacting with HBase via Thrift server over HTTP client. So, we recommend using the second method.
Some known bugs when using TSaslClientTransport with Kerberos enabled in some CDP versions
Upstream Jira HBASE-21652 where a bug is introduced related to Kerberos principal handling.
When refactoring the Thrift server, making thrift2 server inherit from thrift1 server, ThriftServerRunner ThriftServer is merged and the principal switching step was omitted.
Before the refactoring, everything is run in a doAs() block in ThriftServerRunner.run().
References
Programmer’s Guide to Apache Thrift
Python3 connection to Kerberos Hbase thrift HTTPS
Use the Apache Thrift Proxy API
How-to: Use the HBase Thrift Interface, Part 1
How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows
Disclaimer
This article did not test all the versions; both methods are tested in Python 2.7.5 and Python 3.6.8.
Change the code according to your need, if encounter an issue. Posting questions to the Community and raising cases with Cloudera support are recommended.
... View more
09-15-2021
03:40 AM
1 Kudo
Often this happens as there is a "hidden" character at the end of the file or folder name. For example a line break (\n, \r, etc). If you list the files you can get a clue that is the case as usually the output will look strange with an extra line or something there. You can try running a few commands like the following to see if it matches a file: hdfs dfs -ls $'/path/to/folder\r' hdfs dfs -ls $'/path/to/folder\n' hdfs dfs -ls $'/path/to/folder\r\n' If any of those match, then you can delete the incorrect one with a similar command. If you get no luck with that, then pipe the ls output into "od -c" and it will show the special characters hdfs dfs -ls /path/to/folder | od -c
... View more