Member since
06-19-2017
62
Posts
1
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2461 | 03-17-2022 10:37 AM | |
1681 | 12-10-2021 04:25 AM | |
2257 | 08-18-2021 02:20 PM | |
6067 | 07-16-2021 08:41 AM | |
1131 | 07-13-2021 07:03 AM |
02-02-2023
08:55 AM
Hello All, We have a Spark program which executes multiple queries and the tables are Hive tables. Currently the queries are executed using Tez engine from Spark. I set it sqlContext.sql("SET hive.execution.engine=spark") in the program and understand that the queries/program would run as Spark. We are using HDP 2.6.5 version and Spark 2.3.0 version in our cluster. Can someone suggest that it is the correct way as we do not need to run the queries using Tez engine and Spark should run as it is . In the config file /etc/spark2/conf/hive-site.xml, we do not have any specific engine property setup and we do have only kerberos, metastore property details. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
07-12-2022
03:08 PM
@Chakkara As far as i remember , distributed cache is not having consistency . You could use Hbase or HDFS for storing the status of success or failure of the processors for downstream application. Once you saved the Success and Failure at Hbase . You can retrieve it from the Hbase processor using the row ID. Build a REST API NiFi flow to pull the status from Hbase for example HandleHTTPRequest --> FetchHbaseRow - HandleHTTPResponse You can call the HTTP API (Request and Response) via shell script/curl and call the script from Control-M.
... View more
03-18-2022
12:08 AM
1 Kudo
Hi @VR46 Thanks for the analysis. As It did not work initially in Python requests ,then i tried to check in Java Sprintboot Rest template and it started the Oozie workflow . I found the difference that Base64 package was needed for encoding the username and password .The same I applied in Python and finally it worked .
... View more
03-17-2022
10:37 AM
Hi @VR46 , I did solve the issue by encoding the username and password using base64 module and passed to post request . It worked
... View more
12-10-2021
04:25 AM
Hi , Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path. 1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder. Properties to mention in ListSFTP processor are highlighted below 2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path. Properties to configure in FetchSFTP processor. 3. In PUTHDFS processor , please configure the highlighted values of your project and required folder. If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi. 4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.
... View more
11-24-2021
11:58 AM
Hi, Currently we are using NiFi to pull the files from SFTP server and put into HDFS using NiFi listSFTP processor and FetchSTP processor and we can track the status of the flow whether it is success or failure and the ingestion status can be stored in persistent storage for example hbase or hdfs.We can query the status of the Ingestion anytime . Another option we did in previous projects that we pulled the files from NAS storage to local file system(edge node or gateway node) then to Hadoop using SFTP copy unix command to put the file to HDFS using hdfs commands. The process has been done in shell script .Scheduled through control-M. Between what you mean by all flow in same place ? We can develop single NiFi flow to pull the files from SFTP server and put into Hadoop files system target path.
... View more
11-22-2021
01:46 AM
Hi , Since there is no json structure mentioned in the question , Could you please check the JOLT nifi processor . Refer the jolt specification . https://stackoverflow.com/questions/61697356/extract-first-element-array-with-jolt Using Jolt nifi processor , we can perform many transformations from json file. Thanks
... View more
10-23-2021
12:23 AM
Hi , Could you please check the user is having the permission to trigger Oozie in Ranger policy (OR) also please check your Oozie workflow xml file is present in HDFS path once .. Normal Basic Auth is fine for accessing Oozie REST APIs ..I am able to perform POST and GET request of Oozie workflow successfully and monitor the status of the workflow in the same script .
... View more
10-14-2021
02:02 AM
Hi @ShankerSharma , Thank you for the confirmation . Yes. I mentioned one of the working drop partition query in the post. We were in situation to use the functions inside drop partition clause . We will adopt the 14days calculation in script and pass the value to DROP partition statement.
... View more
10-13-2021
05:47 AM
Hi @arunek95 Yes,the workaround has been applied by following the community posts. As of now .we don't have any root-cause why many files were in OPENFORWRITE state for particular two days in our cluster. https://community.cloudera.com/t5/Support-Questions/Cannot-obtain-block-length-for-LocatedBlock/td-p/117517 Thanks
... View more