Member since
09-29-2021
14
Posts
1
Kudos Received
0
Solutions
10-02-2023
09:23 AM
Hi, thank you for the answers. @cravani unfortunately Impala is not used. About pyODBC, @mszurap , it sounds like the best option to adopt. We will work with this, and we will update you soon. Best regards Andrea
... View more
09-29-2023
07:14 AM
Hi, in a Unix environment, I get this error while connecting to Hive via Kerberos, using the library pyHive in a python script: thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server krbtgt/LOCAL.IT@EXAMPLE.IT not found in Kerberos database)' I am able to connect to Kerberos using "kinit -kt user user.keytab", and also via Hive ODBC driver. I use the same krb5.conf file, with Default Realm = EXAMPLE.IT. With kinit, I obtain, correctly: Default principal: user@EXAMPLE.IT Valid starting Expires Service principal 09/28/23 11:05:16 09/28/23 11:05:16 krbtgt/EXAMPLE.IT@EXAMPLE.IT The error is only using pyHive library. In the error, the library uses the domain LOCAL.IT instead of the one specified in krb5.conf, that is EXAMPLE.IT My connection in pyHive: conn = hive.Connection(host="host.domain.it",
port=10000,
auth="KERBEROS",
database="db_123",
kerberos_service_name="hive") Note that LOCAL.IT is equal to domain.it. Can you help me? Thank you
... View more
Labels:
- Labels:
-
Apache Hive
-
Kerberos
06-08-2022
03:50 AM
Hi @mszurap , I agree with you about these numbers. Even if 60-100TB is a high amount of data, the total number of blocks involved is not so high (next to 600k), if compared to each Datanode. Each datanode reports 9M of blocks, but we found the problem is related to other directories that cointain small files, where block size is about 2-3MB. Even if the total size of these directory is not so high, we expect the number of block will decrease more significantly. We are facing the problem of small files, which determines a high number of blocks. The directory we have deleted had larger blocks, which is why the decrease in blocks was imperceptible. Thank you for the support in the analysis!
... View more
06-06-2022
05:07 AM
Hi, I'm still analyzing the output: the command "fsck" on the path where deleting operations have been made reports just 1 block. Looking at the attached chart, you can see that on May, the 19th, a lot of data was removed from hdfs (60TB), and the number of blocks decreased for a single datanode (bda1node02). 600.000 blocks (1 block -> 256MB). In the other datanodes, blocks remained the same (or increased slightly).
... View more
06-01-2022
03:04 AM
1 Kudo
Hi, thank you for the replies. @mszurap no upgrade has been made recently, and there are no pending steps. @Shelton we kept files on Trash, but after 24h files were deleted. At HDFS side, the capacity has decreased, but the number of blocks is still high (and does not change). Thank you again
... View more
05-31-2022
02:09 AM
Hi Miklos, sorry for the typo.. I have executed the command hdfs dfs -ls /snapshottable_path/.snapshot and got no lines on the directory. The "du" commands ("du -x -h" and "du -h") report the same size. When I click on the block count alerts on the HDFS service, I can see the number of blocks, which does not decrease. The DataNode has 8,743,931 blocks. Critical threshold: 8,000,000 block(s). Thank you again.
... View more
05-31-2022
12:38 AM
Hi Miklos, thank you for the detailed answer. I found that the parent of the directory I removed has snapshots enabled, but there are no snapshots. The command: hdfs dfs -du -x -h -v -s /snapshottable_path returns no lines. Also the output of "du" is the same. Should I disable snapshots on the parent directory? Are there other configuration I should apply? Thank you again.
... View more
05-30-2022
08:36 AM
Hi, after having deleted tera bytes of data from HDFS (1/4 of the total capacity), the block count among data nodes did not decrease as expected. It is still over the critical threshold. How could it be solved? Thank you
... View more
Labels:
- Labels:
-
HDFS
05-19-2022
11:32 PM
Hi Alex, thank you for having confirmed that. I'll proceed as you suggest. Regards Andrea
... View more
05-16-2022
03:09 AM
Hi, is there any documentation to install R and SparkR in a gateway node of a DataHub? I have CM 7.5.2 and a CDP Public subscription. Spark version currently configured: spark 2.4.8 spark 3.1.2 Thank you Andrea
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)