Member since
09-16-2021
357
Posts
53
Kudos Received
28
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
99 | 02-04-2025 05:55 AM | |
510 | 11-22-2024 05:29 AM | |
248 | 11-15-2024 06:38 AM | |
528 | 11-13-2024 07:12 AM | |
554 | 11-10-2024 11:19 PM |
10-13-2022
02:46 AM
@Sunil1359 Compilation might be higher if the table has a large number of partitions or if the HMS process is slow when the query runs. Please check the below on the corresponding time period to find the root cause. HS2 log HMS log HMS jstack In Tez engine queries will run in the form of DAG. In the compilation phase, once the semantic analysis process is completed, the plan will be generated depending on the query you submitted. explain <your query> gives the plan of the query. Once the plan is generated DAG will be submitted to yarn and the DAG will run depending on the plan. As part of DAG, Split generation, input file read, shuffle fetch ..etc will be taken care and the end result will be transferred to the client.
... View more
04-25-2022
11:15 PM
Hi, From shell find the files that needs to be deleted and save them in a temp file like below, #!/bin/sh
today=`date +'%s'`
hdfs dfs -ls /file/Path/ | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')
if [ ${difference} -gt 3 ]; then
echo -e "$filePath" >> toDelete
fi
done Then execute arbitrary shell command using form example subprocess.call or sh library so something like below import subprocess
file = open('toDelete', 'r')
for each in file:
subprocess.call(["hadoop", "fs", "-rm", "-f", each]) Also, you can use hdfs fs API in PySpark like below, from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
def delete_path(spark, path):
sc = spark.sparkContext
fs = (sc._jvm.org
.apache.hadoop
.fs.FileSystem
.get(sc._jsc.hadoopConfiguration())
)
fs.delete(sc._jvm.org.apache.hadoop.fs.Path(path), True)
delete_path(spark, "Your/hdfs/path")
... View more
04-22-2022
04:24 AM
Hi , The below source code removes files that are older than 3 days from the HDFS path #!/bin/sh
today=`date +'%s'`
hdfs dfs -ls /file/Path/ | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')
if [ ${difference} -gt 3 ]; then
hdfs dfs -rm -r $filePath
fi
done hdfs dfs -rm -r command moves the data to the trash folder if the trash mechanism is configured. To ignore moving the file to trash folder use skipTrash option.
... View more
02-17-2022
09:22 AM
Hi, Tried the same in 3.1.0.0-78. It's working as expected in the document. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive_surrogate_keys.html Could you please share beeline -u <hiveserver jdbc uri> -e "set -v" output. Also, the error stack trace looks like it's coming from NN. Please check your NN logs for further information. Also could see the below JIRA regarding this, https://issues.apache.org/jira/browse/HIVE-21238
... View more
02-15-2022
12:04 AM
Hi, Hiveserver2 runs as a java process, When running timestamp-related UDF functions in Hive, the default behavior is to use the system's timezone information to convert timestamp values. PFB INFO : Executing command(queryId=hive_20220215131716_33c6c90b-207a-43fe-9bcf-4a52cd04de3e): SELECT current_timestamp()
INFO : Completed executing command(queryId=hive_20220215131716_33c6c90b-207a-43fe-9bcf-4a52cd04de3e); Time taken: 0.092 seconds
INFO : OK
+--------------------------+
| _c0 |
+--------------------------+
| 2022-02-15 13:17:16.204 |
+--------------------------+
1 row selected (2.413 seconds)
[hive@c2757-node3 ~]$ date
Tue Feb 15 13:17:26 IST 2022
[hive@c2757-node3 ~]$ timedatectl | grep "Time zone"
Time zone: Asia/Kolkata (IST, +0530) To make Hive return a specific timezone with the timestamp function. please follow the steps below: 1. Go to the Cloudera Manager home page > Hive > Configuration Under "Client Java Configuration Options", append " -Duser.timezone=UTC" in the text string (be mindful of the leading space in front, if you append to the end of existing options). 2 . Under "Java Configuration Options for HiveServer2", append the same thing to the end of the text string " -Duser.timezone=UTC". 3 . Save the configuration, then restart any HiveServer2 instances, and select Actions -> "Deploy Client Configuration" through Cloudera Manager. To confirm the new configuration is working, see below test outputs: Before the change (system's default timezone is IST): INFO : Executing command(queryId=hive_20220215131716_33c6c90b-207a-43fe-9bcf-4a52cd04de3e): SELECT current_timestamp()
INFO : Completed executing command(queryId=hive_20220215131716_33c6c90b-207a-43fe-9bcf-4a52cd04de3e); Time taken: 0.092 seconds
INFO : OK
+--------------------------+
| _c0 |
+--------------------------+
| 2022-02-15 13:17:16.204 |
+--------------------------+
1 row selected (2.413 seconds)
[hive@c2757-node3 ~]$ date
Tue Feb 15 13:17:26 IST 2022
[hive@c2757-node3 ~]$ timedatectl | grep "Time zone"
Time zone: Asia/Kolkata (IST, +0530) After the change to UTC, [hive@c2757-node3 ~]$ timedatectl | grep "Time zone"
Time zone: Asia/Kolkata (IST, +0530)
[hive@c2757-node3 ~]$ date
Tue Feb 15 13:28:05 IST 2022
INFO : Executing command(queryId=hive_20220215075824_6a58bdcb-b1b6-470d-9202-26ccfc60f521): SELECT current_timestamp()
INFO : Completed executing command(queryId=hive_20220215075824_6a58bdcb-b1b6-470d-9202-26ccfc60f521); Time taken: 0.079 seconds
INFO : OK
+--------------------------+
| _c0 |
+--------------------------+
| 2022-02-15 07:58:24.785 |
+--------------------------+
1 row selected (2.24 seconds) Like this, you can mention hive to use other timezones.
... View more
02-14-2022
11:22 PM
Hi, Tried to replicate the same in one of my local clusters (HDP 3.1.5). It's working as expected. PFB INFO : Compiling command(queryId=hive_20220215071436_d95bcb83-aa21-4335-a119-1af67a162ad2): INSERT INTO students_v2 (row_id, name, dorm) SELECT * FROM students
INFO : Completed executing command(queryId=hive_20220215071436_d95bcb83-aa21-4335-a119-1af67a162ad2); Time taken: 7.261 seconds
INFO : Compiling command(queryId=hive_20220215071453_644b75f3-a6ca-44ed-8646-2fe1a2b7b3dc): SELECT * FROM students_v2
DEBUG : Shutting down query SELECT * FROM students_v2
+-----------------+---------------------+-------------------+-------------------+
| students_v2.id | students_v2.row_id | students_v2.name | students_v2.dorm |
+-----------------+---------------------+-------------------+-------------------+
| 1099511627776 | 1 | fred flintstone | 100 |
| 1099511627777 | 2 | barney rubble | 200 |
+-----------------+---------------------+-------------------+-------------------+ Could you please your cluster version details, to replicate the same.
... View more
01-24-2022
10:39 PM
yes, It exists in the hive CDH source. org.apache.hadoop.hive.ql.udf package is part of hive-exec.jar. PFB [hive@node3 lib]$ /usr/jdk64/jdk1.8.0_112/bin/jar -tvf hive-exec.jar | grep "org.apache.hadoop.hive.ql.udf" | wc -l 660
... View more
01-21-2022
04:35 AM
Hi @mala_etl , The Corresponding mentioned jar, you can download from the below maven repository. https://mvnrepository.com/artifact/org.apache.hive/hive-exec/1.1.0-cdh5.7.0 For other versions - https://mvnrepository.com/artifact/org.apache.hive/hive-exec?repo=cloudera
... View more
01-20-2022
09:57 PM
With the help of cast udf function, you can convert a string to an integer. cast(str_column as int) Reference - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions If the value is between –2147483648 and 2147483647, cast(string_filed as int) will work. 0: jdbc:hive2:Hive> select cast('2147483647' as int);
INFO : Compiling command(queryId=hive_20220121054634_f3485a1b-692a-412d-8fe4-1c616c21d0aa): select cast('2147483647' as int)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20220121054634_f3485a1b-692a-412d-8fe4-1c616c21d0aa); Time taken: 0.17 seconds
INFO : Executing command(queryId=hive_20220121054634_f3485a1b-692a-412d-8fe4-1c616c21d0aa): select cast('2147483647' as int)
INFO : Completed executing command(queryId=hive_20220121054634_f3485a1b-692a-412d-8fe4-1c616c21d0aa); Time taken: 0.009 seconds
INFO : OK
+-------------+
| _c0 |
+-------------+
| 2147483647 |
+-------------+
1 row selected (0.296 seconds)
0: jdbc:hive2://hive> select cast('2147483648' as int);
INFO : Compiling command(queryId=hive_20220121054651_15cbd283-7de5-4310-bbe9-b262ebbd1e07): select cast('2147483648' as int)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20220121054651_15cbd283-7de5-4310-bbe9-b262ebbd1e07); Time taken: 0.059 seconds
INFO : Executing command(queryId=hive_20220121054651_15cbd283-7de5-4310-bbe9-b262ebbd1e07): select cast('2147483648' as int)
INFO : Completed executing command(queryId=hive_20220121054651_15cbd283-7de5-4310-bbe9-b262ebbd1e07); Time taken: 0.006 seconds
INFO : OK
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.086 seconds)
... View more
01-19-2022
02:01 AM
Hi, beeline -e "query_string". the query string can contain multiple queries. Those queries will be executed sequentially. In these case once loadTable1.hql completed after that loadTable2.hql will start. One Solution - Merge both the file as a single file and execute the target HQL by replacing hive variables like below, [hive@node~]$ cat loadTable1.hql loadTable2.hql
#HQL file 1
select * from sample1 where rownum between ${hivevar:FROM} and ${hivevar:TO};
#HQL file 2
select * from sample2 where rownum between ${hivevar:FROM} and ${hivevar:TO};
[hive@node~]$ cat loadTable1.hql loadTable2.hql > target.hql
[hive@node ~]$ beeline -u "${jdbc_url}" --hivevar FROM=10 --hivevar TO=30 -f target.hql
0: jdbc:hive2> #HQL file 1
0: jdbc:hive2> select * from sample1 where rownum between ${hivevar:FROM} and ${hivevar:TO};
DEBUG : Acquired the compile lock.
INFO : Compiling command(queryId=hive_20220119094048_055b8d53-a101-453f-902b-3b08b6ffd4d3): select * from sample1 where rownum between 10 and 30
.
.
.
0: jdbc:hive2> #HQL file 2
0: jdbc:hive2> select * from sample2 where rownum between ${hivevar:FROM} and ${hivevar:TO};
DEBUG : Acquired the compile lock.
INFO : Compiling command(queryId=hive_20220119094048_9271ad34-64d1-424b-ad09-f3ca8cf630f9): select * from sample2 where rownum between 10 and 30
.
.
.
... View more
- « Previous
- Next »