About stevenmatison

stevenmatison · ‎09-10-2020

@Francesco_Fa You will need to configure hue with credentials and information for hbase: [hbase] # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'. # Use full hostname. If hbase.thrift.ssl.enabled in hbase-site is set to true, https will be used otherwise it will use http # If using Kerberos we assume GSSAPI SASL, not PLAIN. ## hbase_clusters=(Cluster|localhost:9090) # HBase configuration directory, where hbase-site.xml is located. ## hbase_conf_dir=/etc/hbase/conf # Hard limit of rows or columns per row fetched before truncating. ## truncate_limit = 500 # Should come from hbase-site.xml, do not set. 'framed' is used to chunk up responses, used with the nonblocking server in Thrift but is not supported in Hue. # 'buffered' used to be the default of the HBase Thrift Server. Default is buffered when not set in hbase-site.xml. ## thrift_transport=buffered # Choose whether Hue should validate certificates received from the server. ## ssl_cert_ca_verify=true Once you have the configuration working, you should be able to view hbase within the UI. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-04-2020

@P_Rat98 You need parquet tools to read parquet files from command line. There is no method to view parquet in nifi. https://pypi.org/project/parquet-tools/

stevenmatison · ‎09-04-2020

@DanMcCray1 Once you have the content from Kafka as a flowfile, your options are not just limited to ExecuteScript. Depending on the type of content you can use the following ideas: EvaluateJsonPath - if the content is a single json, and you need one or more values inside the object then this is an easy way to get those values to attributes. ExtractText - if the content is text or some raw format, extractText allows you to regex match against the content to get values to attributes. QueryRecord w/ Record Readers & Record Writer - this is the most recommended method. Assuming your data has structure (text,csv,json,etc) and/or multiple rows/objects you can define a reader, with schema, output format (record writer), and query the results very effectively. If you indeed want to work with Execute Script you should start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-2/ta-p/249018 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎09-02-2020

@Tokolosk the solution you need is: ${date_posted:format('MM-dd-yyyy hh:mm:ss') } ${date_posted:multiply(1000):format('MM-dd-yyyy hh:mm:ss') } Of course you can experiment with different format... I created a test in a template I use (Working With Timestamps) where I set date_posted to your string value, then 2 conversion tests: If you are getting empty values, then I suspect you have an issue with ${date_posted} not the expression language. Maybe also take a look at using a different attribute name for formatted timestamp. Hope this helps... If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-28-2020

@P_Rat98 The error above is saying there is an issue with the Schema Name in your record reader or writer. When inside the properties for Convert Record, click the --> arrow through to the reader/writer and make sure they are configured correctly. You will need to provide the correct schema name (if it is already an existing attribute) or provide the schema text. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-28-2020

@ujay The solution you are looking for is the DetectDuplicate processor: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html This processor is used with Distributed Map Cache Client and Server (Controller Services) to deduplicate a flow based on your criteria. I have a template demo here: https://github.com/steven-matison/NiFi-Templates/blob/master/DetectDuplicate_DistributedMapCache_Demo.xml If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-28-2020

@P_Rat98 You need to set the filename (Object Key) of each parquet file uniquely to save different S3 files. If that processor is configure to just ${filename} then it will over write additional executions. For the second option, if you have split in your data flow, the split parts should have key/value pair for the split and total splits. Inspect your queue and list attributes on split flowfiles for these attributes. You use these attributes with MergeContent to remerge everything back together into a single flowfile. You need to do this before converting to parquet, not after. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-27-2020

Yes. Auto-termination is how you drop the whole flowfile and all attributes. If for example, you have a giant attribute of SCHEMA or JSON, using updateAttribute could be used to empty the value. However if you do not even need the flowfile anymore, auto-terminate, versus updateAttribute and retain flowfile.

stevenmatison · ‎08-27-2020

@Jarinek Yes, this is totally possible in NiFi. Traditionally one method to do this could be something like EvaluateJsonPath to get some payload json values to attributes and then use AttributesToJson to reform a new json object of all the attributes you have created. In newer versions of Nifi you have some other options with QueryRecord, UpdateRecord and the Json Record Readers, Record Writers. I think another option is also custom ExecuteScript and JoltTransform on JSON. My suggestion would be to research NiFi + JSON and begin with some simple examples. Once you have some experience with basic examples, begin to transform them to suit your use case. I also suggest that you try it in different ways before deciding on one. For example, you may build a flow and get it working (evaluateJsonPath), but improve it over time based on new nifi versions and its capabilities (Use JSON Record Reader/Writers). If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎08-27-2020

@Mike in Austin Thanks for sharing that link. I had lost the one I book marked last year. Since you guys are both Cloudera Employees, can you comment on enterprise customers with HDP 2.6.5 which cannot or will not be able to upgrade or replace HDP for many years? That December 2020 date is right around the corner. Is Cloudera going to truly stop providing support or are these customers just going to get support elsewhere? I know some customers on 2.6.5 that could still be on that platform as legacy 5+ years from now due to just how long it takes in the government sector to replace/upgrade technology.

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: Create Hue dashboard from HBase Table

Re: Convert Json to Avro in Nifi

Re: How to read content of FlowFile

Re: Converting an attribute epoch timestamp to dat...

Re: Hosting an API and pushing Json data to S3 whi...

Re: Editing Nifi queue

Re: pushing multiple objects in S3 bucket using Pu...

Re: Is it the best practice to delete the user def...

Re: NiFi : combine flow file attribute and content

Re: Will Cloudera decomission HDP?