About mala_etl

ggangadharan · ‎10-03-2023

It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html

bulbcat · ‎09-15-2023

I use CDH 6.3.2 。 hive 2.1 hadoop 3.0 hive on spark 。yarn cluster 。 hive.merge.sparkfiles=true ; hive.merge.orcfile.stripe.level=true ; This configuration makes the 1099 reduce file result merge into one file when the result is small 。Then the merged file has about 1099 stripes in one file 。 Then the result is so slow when it is read. I tried hive.merge.orcfile.stripe.level=false ; The result is desirable 。One small file with one stripe and read fast 。 Can anyone tell the difference between true and false ？ Why " hive.merge.orcfile.stripe.level=true " is the default one ?

RangaReddy · ‎08-30-2022

What is the HDP version. if it is HDP3.x then you need to use Hive Warehouse Connector (HWC).

steven-matison · ‎04-08-2022

@mala_etl Well I cant do all of them, that would be to your values, not mine, but to achieve the move, you need to set: Completion Strategy: Move File Move Destination Directory: The directory to move file to Create Directory: enable true/ disable false Be sure to check the ? for each property, it will explain everything.

ChethanYM · ‎02-16-2022

Hi @mala_etl , It looks like a below known issue: https://issues.cloudera.org/browse/HUE-8717 Do work with Cloudera support to get a patch. Regards, Chethan YM

ggangadharan · ‎01-24-2022

yes, It exists in the hive CDH source. org.apache.hadoop.hive.ql.udf package is part of hive-exec.jar. PFB [hive@node3 lib]$ /usr/jdk64/jdk1.8.0_112/bin/jar -tvf hive-exec.jar | grep "org.apache.hadoop.hive.ql.udf" | wc -l 660

adhishankarit · ‎12-10-2021

Hi , Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path. 1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder. Properties to mention in ListSFTP processor are highlighted below 2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path. Properties to configure in FetchSFTP processor. 3. In PUTHDFS processor , please configure the highlighted values of your project and required folder. If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi. 4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.

Online	Offline
Last Visited	‎12-27-2022 10:35 PM

Member Since	‎11-23-2021 06:00 AM
Last Visited	‎12-27-2022 10:35 PM
Posts	13

Cloudera Community

Re: Insert data from spark data frame to hive orc ...

Re: ORC Creation Best Practices

Re: Spark cannot read hive orc table

Re: Move or rename files in ftp server after pull ...

Re: Sqoop Error: Workflow submission failed: 'stat...

Re: Download hive class library "hive-1.1.0-cdh5.7...

Re: What is the Big data framework for collect dat...