Member since
07-19-2018
613
Posts
100
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3150 | 01-11-2021 05:54 AM | |
2250 | 01-11-2021 05:52 AM | |
6006 | 01-08-2021 05:23 AM | |
5576 | 01-04-2021 04:08 AM | |
25816 | 12-18-2020 05:42 AM |
08-13-2020
07:58 AM
@stevenmatison Yes got your point ! but when you create a hive table with varchar (sufficient number)Can the columns datatype changed from varchar to string automatically!? When I create a view out of that table, the datatype is getting changed to string.
... View more
08-13-2020
07:52 AM
@scotth1 You should be able to use QueryRecord processor with some advanced sql to extract what you need from any values in the underlying data result. Here is a great article about QueryRecord: https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Processor-Apache/ta-p/246671 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
07:44 AM
@devops there are some informations around the internet for work arounds making ambari think python 2 is really python 3, but the short answer is above. Please accept the answer. The workaround just creates more problems than it solves. Python 3 was never finished in Ambari, and it doesn't look like it will ever be improved for python 3. In the same high level conversation, the version of java supported is now similarly antiquated.
... View more
08-13-2020
06:51 AM
@stevenmatison Thanks .I used QueryRecord ,it helped to get count .
... View more
08-13-2020
05:11 AM
@ManuN Anyway you go about this task, you are going to have to execute the commands against the tables to get sizes. With a large number of tables this should be a script, program, or process. The common methods are to query the table with hive: -- gives all properties
show tblproperties yourTableName
-- show just the raw data size
show tblproperties yourTableName("rawDataSize") Or the most accurate is to look at the table location in HDFS: hdfs dfs -du -s -h /path/to/table There are also some methods to try and get this data directly from the Hive Metastore, assuming the table is an internal Hive table. In the past I have completed this with a basic bash/shell script. I have also done similar in NiFI and prefer to do it like this without coding. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
04:53 AM
@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-12-2020
06:53 AM
@Deenag Yes, this is a typical method to filter out flowfiles based on attributes matching expression language. You setup the routes you want and ignore the rest.
... View more
08-12-2020
06:42 AM
@Nidutt you should be able to use NifI expression language in the flow to change date int to ISO timestamps. Here is a template you can use that shows many examples of timestamp formatting: https://github.com/steven-matison/NiFi-Templates/blob/master/Working_with_TimeStamps.xml I think you may find that nifi attributes remain strings in your flow without a strict date type, after all an ISO timestamp is really a string, your end point database just knows it is a "timestamp".... If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-09-2020
09:27 PM
I have flow files with different dimensions. But they have a common id column. I want to use that to join the flowfiles and pick specific columns. How can I use mergeContent in this case?
... View more
08-07-2020
09:31 PM
Thanks for your point and if you got time, please read the solution that i found out somewhere in the internet.
... View more