Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1227 | 09-04-2020 12:33 AM | |
7790 | 08-25-2020 12:39 AM | |
2450 | 08-24-2020 02:40 AM | |
2170 | 08-21-2020 01:06 AM | |
1160 | 08-20-2020 02:46 AM |
09-24-2020
12:12 AM
Hi...you can use compress content to decompress. I am not 100% if it decompresses lzo files. If not, you can executestreamcommand to run a shell command to uncompress the files. Hope this helps.
... View more
09-22-2020
06:55 AM
Hi....you can use GetHDFS to retrieve the files from HDFS. Use RouteonAttribute if you wish to apply some filter on the flowfile attribute and finally PutFile to save the file on local machine. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-15-2020
12:49 AM
Hi....i have a similar problem but havent found the root cause. Although, we do have a workaround in place. Please check this post : https://community.cloudera.com/t5/Support-Questions/Ranger-installation-fails-with-0-status-code-received-on/m-p/300848#M220394 The reason behind the error is that Ambari cannot fetch the recommended settings for the change. This can happen if the API call fails to receive any reply since the connection is blocked. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-08-2020
07:34 AM
Then im afraid its difficult to do so. I dont understand how you are feeding the queries to execute sql. Maybe its good to feed executesql with manageable queries. If you are using GenerateTableFetch then it allows you to break a big query into smaller queries like you want and feed it to ExecuteSQL. Hope this helps. Please do post back on how to managed to move forward.
... View more
09-07-2020
04:33 AM
Why not use the 2nd option i said above....Use splitcontent or splitrecord and then merge it later whenever you want it.
... View more
09-04-2020
04:20 AM
Hi...to know which which flowfile completed, you can use a putemail processor to get an email when a particular flowfiles is finished. You can make it dynamic using db.table.name attribute which is added by generatetablefetch...if you have a lot of flowfiles for a single table, you can merge the flowfiles using mergecontent on tablename to give you periodic or batch completion status. Another way could be to write success and failures to for e.g hive table and you can check the table for completions and failure. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-04-2020
12:33 AM
Hi....you can create a single flow as long as you can distinguish the files for e.g using the filename. You can route on attribute and load it different table. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
09-04-2020
12:12 AM
You should use ListDatabaseTable and generatetablefetch to perform an incremental load. If you are joining the tables, you can do a replacetext after generatetablefetch to add the join query and then feed the flowfile to execute sql. You can split the amount of data in generatetablefetch. OR You can use splitrecord / splitcontent to split the single avro to multiple smaller files and then use mergecontent to merge them back if required. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
09-04-2020
12:06 AM
Hi....you haven't mentioned what kind of script you are planning to use. Assuming you are using Groovy here is a sample script that should work : def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> BufferedReader br = new BufferedReader(new InputStreamReader(inputStream)) String dummy = flowFile.getAttribute('dummy') br.eachLine { //your logic here } //only if you want to process per line } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
08-25-2020
06:33 AM
Unfortunately, i havent used parquet at all...I would assume 'not a data file' could mean either the file doesnt have schema embedded or file is not in correct format (conversion didnt work)
... View more