Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
643 | 09-04-2020 12:33 AM | |
4112 | 08-25-2020 12:39 AM | |
1119 | 08-24-2020 02:40 AM | |
1186 | 08-21-2020 01:06 AM | |
583 | 08-20-2020 02:46 AM |
07-14-2021
12:36 AM
Is it possible to show what the character is exactly? With logical type string it should accept any character as long as is it not a invalid or garbage value....
... View more
- Tags:
- s
07-05-2021
11:13 PM
Hi..,if you know the exact number of files to be transferred then yes….but if the count is variable then since nifi is a flow tool there is no concept as the last file…. You can send an email if you find any files in the failure relationship… this way you will know if the transfer went well or not.., If this seems obvious please share a sketch of what you wish to inplement for a better solution…
... View more
07-05-2021
05:13 AM
Seems like db2 is using some data type which is not recognizable to Avro. You can try to Disable Avro logical types to false in QueryDatabaseTable and then parse the data correctly within the flow. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
04:55 AM
You can tail nifi-app.log from within nifi and use multi-line regex function to extract the response. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
04:52 AM
Hi...For a single email alert, use MergeContent processor (on some condition) before PutEmail. If you have a definite number of files then in MergeContent you can add Minimum Number of Entries as 10. If not, then only way is time bound. For transfer check, you will have to provide more info here. Where is it transferred? Does nifi handle the transfer? If nifi handles the transfer, you will have files in Failure relationship so that should tell you if the transfer for all files was successful or not. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
04:44 AM
This is a heap space problem. Check if your nifi cluster is big enough to process the amount of data you are consuming from Kafka. Maybe try with a smaller dataset. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
02:26 AM
It seems the avro schema doesnt match with the table structure. E.g it is string in your flowfile, it is string in the destination table as well? If not, you will get this error. The probable reason is that PutHiveStreaming doesnt implicitly change data types. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
02:20 AM
You need to provide more info here. What is the data type of each column? How are you adding data? What is the data format of the hive table? When you use you get the correct result, is it the same result or some other rows.
... View more
07-05-2021
02:17 AM
It seems that the avro logical type are not matching. Please double check the data types. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
02:09 AM
Check your Kerberos credentials cache Also note keyring is not completely compatible. You may have to use file credential cache. If this doesnt work, please send some stack trace to understand the problem.
... View more
09-24-2020
12:12 AM
Hi...you can use compress content to decompress. I am not 100% if it decompresses lzo files. If not, you can executestreamcommand to run a shell command to uncompress the files. Hope this helps.
... View more
09-22-2020
06:55 AM
Hi....you can use GetHDFS to retrieve the files from HDFS. Use RouteonAttribute if you wish to apply some filter on the flowfile attribute and finally PutFile to save the file on local machine. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-15-2020
12:49 AM
Hi....i have a similar problem but havent found the root cause. Although, we do have a workaround in place. Please check this post : https://community.cloudera.com/t5/Support-Questions/Ranger-installation-fails-with-0-status-code-received-on/m-p/300848#M220394 The reason behind the error is that Ambari cannot fetch the recommended settings for the change. This can happen if the API call fails to receive any reply since the connection is blocked. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-08-2020
07:34 AM
Then im afraid its difficult to do so. I dont understand how you are feeding the queries to execute sql. Maybe its good to feed executesql with manageable queries. If you are using GenerateTableFetch then it allows you to break a big query into smaller queries like you want and feed it to ExecuteSQL. Hope this helps. Please do post back on how to managed to move forward.
... View more
09-07-2020
04:33 AM
Why not use the 2nd option i said above....Use splitcontent or splitrecord and then merge it later whenever you want it.
... View more
09-04-2020
04:20 AM
Hi...to know which which flowfile completed, you can use a putemail processor to get an email when a particular flowfiles is finished. You can make it dynamic using db.table.name attribute which is added by generatetablefetch...if you have a lot of flowfiles for a single table, you can merge the flowfiles using mergecontent on tablename to give you periodic or batch completion status. Another way could be to write success and failures to for e.g hive table and you can check the table for completions and failure. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-04-2020
12:33 AM
Hi....you can create a single flow as long as you can distinguish the files for e.g using the filename. You can route on attribute and load it different table. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
09-04-2020
12:12 AM
You should use ListDatabaseTable and generatetablefetch to perform an incremental load. If you are joining the tables, you can do a replacetext after generatetablefetch to add the join query and then feed the flowfile to execute sql. You can split the amount of data in generatetablefetch. OR You can use splitrecord / splitcontent to split the single avro to multiple smaller files and then use mergecontent to merge them back if required. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
09-04-2020
12:06 AM
Hi....you haven't mentioned what kind of script you are planning to use. Assuming you are using Groovy here is a sample script that should work : def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> BufferedReader br = new BufferedReader(new InputStreamReader(inputStream)) String dummy = flowFile.getAttribute('dummy') br.eachLine { //your logic here } //only if you want to process per line } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members
... View more
08-25-2020
06:33 AM
Unfortunately, i havent used parquet at all...I would assume 'not a data file' could mean either the file doesnt have schema embedded or file is not in correct format (conversion didnt work)
... View more
08-25-2020
04:03 AM
@Rohitravi If this has helped please comment accordingly so that @PPB can mark this as a solution for other community members.
... View more
08-25-2020
04:01 AM
Can you provide more details on how you are trying to view the Avro file? And also it would be good to share the stack trace. i am assuming you are using Hive since you said that it is a column. Hive is schema on read so it will only evaluate the data when its read and not when its written in HDFS. Also good to check the timezone for timestamp format. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:58 AM
Please double check the command : cd sandbox.repo /tmp. This will not work so if you are getting no such directory that is correct. Please point to the link you referred to resolve the problem. The error here states that it cannot connect to the repository. If you are using the public repository then make sure you can connect to the link and no firewall is blocking your connection. If you are using the local repo, please check the firewall and if the user you are using to connect to the repository has sufficient privileges. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:54 AM
Hi....you need to check if the service has started correctly and it is running on port 50070. If its fine, then check if there is a firewall (i see its a sandbox so external firewall may not be a cause but check if machines firewall is stopped and disabled). Also good to check SELinux is disabled. If none of this work, try to restart the host machine on which the sandbox is running. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:50 AM
Hi...extremely sorry for the comment before. i thought it should work without trying it out first. I looked at it again and it seems its not possible to do it without a script.
... View more
08-25-2020
03:46 AM
Hi....you should be able to see the avro in Nifi. When you open the file, there is an option on the top left "View as". It is by default Original. Change it to formatted and you should be able to see the avro correctly. Hope this helps.
... View more
08-25-2020
12:39 AM
Where are you viewing the data ? Is it in Nifi? Avro is in binary format so its not readable in any editor. If you are viewing in Nifi, then you should check if the you embedded the schema or not. If the schema is embedded then something in the flow is wrong. If not, i suppose it can look the way it shows you. A quick test could be to dump it in HDFS and create an external hive table to check if the data looks correct. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-24-2020
02:52 AM
1 Kudo
Hi...i see it could be below reasons for your problem : 1. How much data are you selecting in selecthiveql? You need to understand that nifi has its own repository, so when you run a select query, it is not only executed but the data is actually transported to the repository. So, even though the query is executed in a couple of seconds, depending on the amount of data and your network speed, it can take anywhere between minutes to hours for the data to be transported. If you restart nifi, it will hard stop the processor and it will resume from start after restart. So, that will not help you anyway. 2. Puthiveql has a batch size property that is commits. The smaller the batch the more hive has to process and commit. You should increase the batch size, since Hive is built for bulk batch insert and fetch. Optimizing the batch size would help a lot. Try a small fetch say a couple of hundred rows and see if Nifi is still stuck for hours. It should ideally be done in a couple of minutes assuming your network speed is strong. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-24-2020
02:42 AM
I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?
... View more
08-24-2020
02:40 AM
If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.
... View more