About stevenmatison

das_dineshk · ‎08-13-2020

@stevenmatison Yes got your point ! but when you create a hive table with varchar (sufficient number)Can the columns datatype changed from varchar to string automatically!? When I create a view out of that table, the datatype is getting changed to string.

stevenmatison · ‎08-13-2020

@scotth1 You should be able to use QueryRecord processor with some advanced sql to extract what you need from any values in the underlying data result. Here is a great article about QueryRecord: https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Processor-Apache/ta-p/246671 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-13-2020

@devops there are some informations around the internet for work arounds making ambari think python 2 is really python 3, but the short answer is above. Please accept the answer. The workaround just creates more problems than it solves. Python 3 was never finished in Ambari, and it doesn't look like it will ever be improved for python 3. In the same high level conversation, the version of java supported is now similarly antiquated.

Seetha · ‎08-13-2020

@stevenmatison Thanks .I used QueryRecord ,it helped to get count .

stevenmatison · ‎08-13-2020

@ManuN Anyway you go about this task, you are going to have to execute the commands against the tables to get sizes. With a large number of tables this should be a script, program, or process. The common methods are to query the table with hive: -- gives all properties show tblproperties yourTableName -- show just the raw data size show tblproperties yourTableName("rawDataSize") Or the most accurate is to look at the table location in HDFS: hdfs dfs -du -s -h /path/to/table There are also some methods to try and get this data directly from the Hive Metastore, assuming the table is an internal Hive table. In the past I have completed this with a basic bash/shell script. I have also done similar in NiFI and prefer to do it like this without coding. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-13-2020

@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-12-2020

@Deenag Yes, this is a typical method to filter out flowfiles based on attributes matching expression language. You setup the routes you want and ignore the rest.

stevenmatison · ‎08-12-2020

@Nidutt you should be able to use NifI expression language in the flow to change date int to ISO timestamps. Here is a template you can use that shows many examples of timestamp formatting: https://github.com/steven-matison/NiFi-Templates/blob/master/Working_with_TimeStamps.xml I think you may find that nifi attributes remain strings in your flow without a strict date type, after all an ISO timestamp is really a string, your end point database just knows it is a "timestamp".... If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

mhhaji · ‎08-09-2020

I have flow files with different dimensions. But they have a common id column. I want to use that to join the flowfiles and pick specific columns. How can I use mergeContent in this case?

jloormoreira · ‎08-07-2020

Thanks for your point and if you got time, please read the solution that i found out somewhere in the internet.

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	100

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Datatype from varchar to string in hive automa...

Re: extract regex from record field in UpdateRecor...

Re: Ambari server installation on Centos 7 with py...

Re: Apache Nifi to do aggregation for the given tr...

Re: Knowing size of Hive tables

Re: Is there any processor in NiFi which helps me ...

Re: Nifi: how to use folderFilter for fetching fil...

Re: Convert date to ISO format

Re: Execute processor only once for multiple flowf...

Re: Error in Step 3 Ambari: automatically register...