Member since
07-13-2020
58
Posts
2
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
644 | 09-04-2020 12:33 AM | |
4119 | 08-25-2020 12:39 AM | |
1122 | 08-24-2020 02:40 AM | |
1189 | 08-21-2020 01:06 AM | |
585 | 08-20-2020 02:46 AM |
07-20-2021
01:18 AM
@Althotta, Has any of the replies in the post helped you resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
07-14-2021
12:36 AM
Is it possible to show what the character is exactly? With logical type string it should accept any character as long as is it not a invalid or garbage value....
... View more
- Tags:
- s
07-13-2021
12:38 PM
@Ash1 Not clear from yoru query how files are getting into or out of your NiFi. Assuming you have already received a set fo FlowFiles for the day in to your NiFi dataflow, the best approach may be to notify if any fails to be written out /transferred at the end of your dataflow. In this manor not only would you know that not all Files transferred, but would know exact what file failed to transfer. There are numerous processors that handle writing out FlowFile content (transfer) to another source or local file system. Those processing components typically have relationships for handling various types of failures. These relationships could be sent through a retry loop via the RetryFlowFile [1] processor back to the same transfer processor that failed. You define in the RetryFlowFile processor how many times you want a FlowFile to traverse this loop. After X number of loop it would get routed out of the loop to your PutEmail [2] processor where you could dynamically set the email content to include attributes from that FlowFile like filename, hostname of NiFi that failed to transfer it, etc... From the PutEmail processor you could send that FlowFile to somewhere else for holding until manual intervention was taken in response to that email. [`1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.RetryFlowFile/index.html [2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apache.nifi.processors.standard.PutEmail/index.html If you found any of the response given here assisted with your query, please take a moment to login and click "Accept" on each of those solutions. Thank you, Matt
... View more
07-08-2021
06:59 AM
Thank you for your participation in Cloudera Community. I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
07-05-2021
04:55 AM
You can tail nifi-app.log from within nifi and use multi-line regex function to extract the response. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
04:44 AM
This is a heap space problem. Check if your nifi cluster is big enough to process the amount of data you are consuming from Kafka. Maybe try with a smaller dataset. If you find the answer helpful please accept this as a solution.
... View more
07-05-2021
02:20 AM
You need to provide more info here. What is the data type of each column? How are you adding data? What is the data format of the hive table? When you use you get the correct result, is it the same result or some other rows.
... View more
07-05-2021
02:09 AM
Check your Kerberos credentials cache Also note keyring is not completely compatible. You may have to use file credential cache. If this doesnt work, please send some stack trace to understand the problem.
... View more
04-13-2021
05:52 AM
The issue is because there is an error in network communication. Best way to solve it is to run the ambari wizard in a machine on the same subnet as ambari.
... View more
03-01-2021
06:58 PM
hi, could it have any other solutions to solve 403 error ? like this, I only use local hadoop.tar.gz to initial my cluster. I do not use repository. Could you please give me some suggestions ? thanks a lot.
... View more
09-24-2020
12:12 AM
Hi...you can use compress content to decompress. I am not 100% if it decompresses lzo files. If not, you can executestreamcommand to run a shell command to uncompress the files. Hope this helps.
... View more
09-15-2020
12:49 AM
Hi....i have a similar problem but havent found the root cause. Although, we do have a workaround in place. Please check this post : https://community.cloudera.com/t5/Support-Questions/Ranger-installation-fails-with-0-status-code-received-on/m-p/300848#M220394 The reason behind the error is that Ambari cannot fetch the recommended settings for the change. This can happen if the API call fails to receive any reply since the connection is blocked. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
09-11-2020
10:09 AM
@SagarKanani Thanks for your reply.
... View more
09-10-2020
12:54 PM
Maybe I have found a solution.. I'm gonna use the ExecuteSQL to do a "select insert" query.. The query will perform the joins and load the data into a table. Then the QueryDatabaseTable will read from the new table.. That way I'll be able to use the "Max Rows Per Floe File" property.
... View more
09-04-2020
06:25 AM
@P_Rat98 You need parquet tools to read parquet files from command line. There is no method to view parquet in nifi. https://pypi.org/project/parquet-tools/
... View more
09-04-2020
06:20 AM
@DanMcCray1 Once you have the content from Kafka as a flowfile, your options are not just limited to ExecuteScript. Depending on the type of content you can use the following ideas: EvaluateJsonPath - if the content is a single json, and you need one or more values inside the object then this is an easy way to get those values to attributes. ExtractText - if the content is text or some raw format, extractText allows you to regex match against the content to get values to attributes. QueryRecord w/ Record Readers & Record Writer - this is the most recommended method. Assuming your data has structure (text,csv,json,etc) and/or multiple rows/objects you can define a reader, with schema, output format (record writer), and query the results very effectively. If you indeed want to work with Execute Script you should start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-2/ta-p/249018 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-04-2020
04:20 AM
Hi...to know which which flowfile completed, you can use a putemail processor to get an email when a particular flowfiles is finished. You can make it dynamic using db.table.name attribute which is added by generatetablefetch...if you have a lot of flowfiles for a single table, you can merge the flowfiles using mergecontent on tablename to give you periodic or batch completion status. Another way could be to write success and failures to for e.g hive table and you can check the table for completions and failure. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
04:03 AM
@Rohitravi If this has helped please comment accordingly so that @PPB can mark this as a solution for other community members.
... View more
08-25-2020
04:01 AM
Can you provide more details on how you are trying to view the Avro file? And also it would be good to share the stack trace. i am assuming you are using Hive since you said that it is a column. Hive is schema on read so it will only evaluate the data when its read and not when its written in HDFS. Also good to check the timezone for timestamp format. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:58 AM
Please double check the command : cd sandbox.repo /tmp. This will not work so if you are getting no such directory that is correct. Please point to the link you referred to resolve the problem. The error here states that it cannot connect to the repository. If you are using the public repository then make sure you can connect to the link and no firewall is blocking your connection. If you are using the local repo, please check the firewall and if the user you are using to connect to the repository has sufficient privileges. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:54 AM
Hi....you need to check if the service has started correctly and it is running on port 50070. If its fine, then check if there is a firewall (i see its a sandbox so external firewall may not be a cause but check if machines firewall is stopped and disabled). Also good to check SELinux is disabled. If none of this work, try to restart the host machine on which the sandbox is running. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-25-2020
03:50 AM
Hi...extremely sorry for the comment before. i thought it should work without trying it out first. I looked at it again and it seems its not possible to do it without a script.
... View more
08-24-2020
02:42 AM
I tried to a little to find Java implementation but i didnt find any. But, cant you use as i mentioned above. Use the groovy script to add the PG id and then in your custom processor just get the attribute containing the PG id and do whatever you goal is?
... View more
08-24-2020
02:40 AM
If your partition is not big enough say a couple of million rows (which i see since you have 10000 partitions on 1billion so approx couple of millions of rows), then its ok to create a single bucket. Also, as long as the file size is greater than block size, having multiple files doesnt degrade the performance. Too many small files less than block size is a concern. You should use compaction since it makes it easier for hive to skip a partition altogether. As i said earlier, there is no best solution. You need to understand how the ad hoc queries are fired and whats the common use case. Only after that, you can take a specific path and you might to run a small POC to do some statistical analysis. Hope this helps.
... View more
08-21-2020
01:20 AM
HI....You can use datediff function like this : WHERE datediff(clinic_discharge_dt,clinic_admit_dt) <= 30 Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
01:06 AM
The error is here : org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.DoubleWritable This states you are trying to cast Text to Double which is not possible. Are you sure that the column is marked as String? String to double casting is always possible. Please double check. Another solution is to drop and recreate the table with correct data types. Since, its an external table there is no data lose anyway. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-21-2020
12:56 AM
If you are using Kerberos for authentication, when a job is submitted, the user permissions are evaluated first by Ranger and once the authorization is successful, only then the Kerberos ticket is delegated to hive user and the hive user starts the execution. So, as long as the user who is submitting the job has a policy in Ranger, it should work as expected. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:46 AM
You need to provide more information here...Is Updatedate processor an Updateattribute processor? What does invokescriptor processor do? Are you storing an attribute called state or are you using the processor's state? Assuming its just a variable in flowfile, your attempt 2 should work. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:22 AM
Assuming that you will create table only once, it is very difficult since you need to parse the columns and data types. The amount of effort is not worth for a one time thing. So, i would suggest you to create the table manually. If you are going to create a table dynamically everytime, i would be interested in how you did this or if you have made some progress i would be happy to help. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more
08-20-2020
02:04 AM
Firstly, provide unique names with prefix to the flowfiles especially to the 3 flowfiles....then you can use route on attribute to route these specific flowfile to a separate parallel path. Then, you can use MergeContent when all the files come together. This is the easier way assuming all 3 files would come one after another and no two same flow files arrive at the same time. For this, you need to use some advanced logic and use Wait notify processor along with Control Rate to send only one flowfiles of each and merge them together. Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.
... View more