About SagarKanani

SagarKanani · ‎09-24-2020

Hi...you can use compress content to decompress. I am not 100% if it decompresses lzo files. If not, you can executestreamcommand to run a shell command to uncompress the files. Hope this helps.

SagarKanani · ‎09-22-2020

Hi....you can use GetHDFS to retrieve the files from HDFS. Use RouteonAttribute if you wish to apply some filter on the flowfile attribute and finally PutFile to save the file on local machine. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎09-15-2020

Hi....i have a similar problem but havent found the root cause. Although, we do have a workaround in place. Please check this post : https://community.cloudera.com/t5/Support-Questions/Ranger-installation-fails-with-0-status-code-received-on/m-p/300848#M220394 The reason behind the error is that Ambari cannot fetch the recommended settings for the change. This can happen if the API call fails to receive any reply since the connection is blocked. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎09-08-2020

Then im afraid its difficult to do so. I dont understand how you are feeding the queries to execute sql. Maybe its good to feed executesql with manageable queries. If you are using GenerateTableFetch then it allows you to break a big query into smaller queries like you want and feed it to ExecuteSQL. Hope this helps. Please do post back on how to managed to move forward.

SagarKanani · ‎09-07-2020

Why not use the 2nd option i said above....Use splitcontent or splitrecord and then merge it later whenever you want it.

SagarKanani · ‎09-04-2020

Hi...to know which which flowfile completed, you can use a putemail processor to get an email when a particular flowfiles is finished. You can make it dynamic using db.table.name attribute which is added by generatetablefetch...if you have a lot of flowfiles for a single table, you can merge the flowfiles using mergecontent on tablename to give you periodic or batch completion status. Another way could be to write success and failures to for e.g hive table and you can check the table for completions and failure. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.

SagarKanani · ‎09-04-2020

Hi....you can create a single flow as long as you can distinguish the files for e.g using the filename. You can route on attribute and load it different table. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members

SagarKanani · ‎09-04-2020

You should use ListDatabaseTable and generatetablefetch to perform an incremental load. If you are joining the tables, you can do a replacetext after generatetablefetch to add the join query and then feed the flowfile to execute sql. You can split the amount of data in generatetablefetch. OR You can use splitrecord / splitcontent to split the single avro to multiple smaller files and then use mergecontent to merge them back if required. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members

SagarKanani · ‎09-04-2020

Hi....you haven't mentioned what kind of script you are planning to use. Assuming you are using Groovy here is a sample script that should work : def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> BufferedReader br = new BufferedReader(new InputStreamReader(inputStream)) String dummy = flowFile.getAttribute('dummy') br.eachLine { //your logic here } //only if you want to process per line } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members

SagarKanani · ‎08-25-2020

Unfortunately, i havent used parquet at all...I would assume 'not a data file' could mean either the file doesnt have schema embedded or file is not in correct format (conversion didnt work)

Online	Offline
Last Visited	‎10-21-2024 07:58 AM

Member Since	‎07-13-2020 05:50 AM
Last Visited	‎10-21-2024 07:58 AM
Posts	58
Kudos received	2

Cloudera Community

Re: Different files with different columns to be l...

Re: Convert Json to Avro in Nifi

Re: bucketing table with just one bucket vs partio...

Re: External table not loading data after Alter

Re: How to access the updateattributes inside groo...

Re: NIFI - Processing HDFS

Re: NIFI - Processing HDFS

Re: Consistency Check Failed (The configuration ch...

Re: How to split the output flow file from Execute...

Re: How to split the output flow file from Execute...

Re: Apache NIFI: How do we know when a flow is com...

Re: Different files with different columns to be l...

Re: How to split the output flow file from Execute...

Re: How to read content of FlowFile

Re: Convert Json to Avro in Nifi