About adhishankarit

adhishankarit · ‎02-02-2023

Hello All, We have a Spark program which executes multiple queries and the tables are Hive tables. Currently the queries are executed using Tez engine from Spark. I set it sqlContext.sql("SET hive.execution.engine=spark") in the program and understand that the queries/program would run as Spark. We are using HDP 2.6.5 version and Spark 2.3.0 version in our cluster. Can someone suggest that it is the correct way as we do not need to run the queries using Tez engine and Spark should run as it is . In the config file /etc/spark2/conf/hive-site.xml, we do not have any specific engine property setup and we do have only kerberos, metastore property details. Thanks

adhishankarit · ‎07-12-2022

@Chakkara As far as i remember , distributed cache is not having consistency . You could use Hbase or HDFS for storing the status of success or failure of the processors for downstream application. Once you saved the Success and Failure at Hbase . You can retrieve it from the Hbase processor using the row ID. Build a REST API NiFi flow to pull the status from Hbase for example HandleHTTPRequest --> FetchHbaseRow - HandleHTTPResponse You can call the HTTP API (Request and Response) via shell script/curl and call the script from Control-M.

adhishankarit · ‎03-18-2022

Hi @VR46 Thanks for the analysis. As It did not work initially in Python requests ,then i tried to check in Java Sprintboot Rest template and it started the Oozie workflow . I found the difference that Base64 package was needed for encoding the username and password .The same I applied in Python and finally it worked .

adhishankarit · ‎03-17-2022

Hi @VR46 , I did solve the issue by encoding the username and password using base64 module and passed to post request . It worked

adhishankarit · ‎12-10-2021

Hi , Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path. 1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder. Properties to mention in ListSFTP processor are highlighted below 2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path. Properties to configure in FetchSFTP processor. 3. In PUTHDFS processor , please configure the highlighted values of your project and required folder. If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi. 4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.

adhishankarit · ‎11-24-2021

Hi, Currently we are using NiFi to pull the files from SFTP server and put into HDFS using NiFi listSFTP processor and FetchSTP processor and we can track the status of the flow whether it is success or failure and the ingestion status can be stored in persistent storage for example hbase or hdfs.We can query the status of the Ingestion anytime . Another option we did in previous projects that we pulled the files from NAS storage to local file system(edge node or gateway node) then to Hadoop using SFTP copy unix command to put the file to HDFS using hdfs commands. The process has been done in shell script .Scheduled through control-M. Between what you mean by all flow in same place ? We can develop single NiFi flow to pull the files from SFTP server and put into Hadoop files system target path.

adhishankarit · ‎11-22-2021

Hi , Since there is no json structure mentioned in the question , Could you please check the JOLT nifi processor . Refer the jolt specification . https://stackoverflow.com/questions/61697356/extract-first-element-array-with-jolt Using Jolt nifi processor , we can perform many transformations from json file. Thanks

adhishankarit · ‎10-23-2021

Hi , Could you please check the user is having the permission to trigger Oozie in Ranger policy (OR) also please check your Oozie workflow xml file is present in HDFS path once .. Normal Basic Auth is fine for accessing Oozie REST APIs ..I am able to perform POST and GET request of Oozie workflow successfully and monitor the status of the workflow in the same script .

adhishankarit · ‎10-14-2021

Hi @ShankerSharma , Thank you for the confirmation . Yes. I mentioned one of the working drop partition query in the post. We were in situation to use the functions inside drop partition clause . We will adopt the 14days calculation in script and pass the value to DROP partition statement.

adhishankarit · ‎10-13-2021

Hi @arunek95 Yes,the workaround has been applied by following the community posts. As of now .we don't have any root-cause why many files were in OPENFORWRITE state for particular two days in our cluster. https://community.cloudera.com/t5/Support-Questions/Cannot-obtain-block-length-for-LocatedBlock/td-p/117517 Thanks

Online	Offline
Last Visited	‎12-22-2023 12:11 PM

Member Since	‎06-19-2017 03:02 PM
Last Visited	‎12-22-2023 12:11 PM
Posts	62
Kudos received	1

Cloudera Community

Re: unable to start oozie workflow using Oozie web...

Re: What is the Big data framework for collect dat...

Re: find last element of json in nifi

Re: Update Json field via Nifi

Re: Prioritize execution of .hql file and then .cs...

Program executing Tez engine instead of Spark

Re: Nifi process group scheduling via control m

Re: unable to start oozie workflow using Oozie web...

Re: unable to start oozie workflow using Oozie web...

Re: What is the Big data framework for collect dat...

Re: What is the Big data framework for collect dat...

Re: NiFi - Using EvaluateJsonPath on flowfiles con...

Re: Run Oozie workflow using oozie rest api - auth...

Re: Hive drop partition is not working

Re: distcp failing intermittently to copy the file...