Member since
11-23-2021
13
Posts
0
Kudos Received
0
Solutions
10-03-2023
02:07 AM
It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html
... View more
09-15-2023
02:06 AM
I use CDH 6.3.2 。 hive 2.1 hadoop 3.0 hive on spark 。yarn cluster 。 hive.merge.sparkfiles=true ; hive.merge.orcfile.stripe.level=true ; This configuration makes the 1099 reduce file result merge into one file when the result is small 。Then the merged file has about 1099 stripes in one file 。 Then the result is so slow when it is read. I tried hive.merge.orcfile.stripe.level=false ; The result is desirable 。One small file with one stripe and read fast 。 Can anyone tell the difference between true and false ? Why " hive.merge.orcfile.stripe.level=true " is the default one ?
... View more
08-30-2022
11:18 PM
What is the HDP version. if it is HDP3.x then you need to use Hive Warehouse Connector (HWC).
... View more
04-08-2022
07:10 AM
@mala_etl Well I cant do all of them, that would be to your values, not mine, but to achieve the move, you need to set: Completion Strategy: Move File Move Destination Directory: The directory to move file to Create Directory: enable true/ disable false Be sure to check the ? for each property, it will explain everything.
... View more
02-16-2022
05:46 AM
Hi @mala_etl , It looks like a below known issue: https://issues.cloudera.org/browse/HUE-8717 Do work with Cloudera support to get a patch. Regards, Chethan YM
... View more
01-24-2022
10:39 PM
yes, It exists in the hive CDH source. org.apache.hadoop.hive.ql.udf package is part of hive-exec.jar. PFB [hive@node3 lib]$ /usr/jdk64/jdk1.8.0_112/bin/jar -tvf hive-exec.jar | grep "org.apache.hadoop.hive.ql.udf" | wc -l 660
... View more
12-10-2021
04:25 AM
Hi , Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path. 1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder. Properties to mention in ListSFTP processor are highlighted below 2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path. Properties to configure in FetchSFTP processor. 3. In PUTHDFS processor , please configure the highlighted values of your project and required folder. If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi. 4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.
... View more