About stevenmatison

stevenmatison · ‎05-16-2020

@gbukovszki The behavior you are describing is just how nifi escapes the string representation of the JSON inside of the schema. It is required in order to send to different avro processors. Assuming you have the schema in an attribute JSONAttribute, when you need to unescape, use the expression language below in UpdateAttribute : ${JSONAttribute:unescapeJson()} You can also do similar action if the escaped values are in a FlowFiles content with ReplaceText in Replacement Value: ${'$1':unescapeJson()} If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎05-16-2020

@johndcal A namespace is not required within the avro schema source in Schema Registry. In the context of avro spec. In order to create an avro schema in the Schema Registry, you have to send the first call to create the schema entity. The next call is then to add the actual avro schema to the existing entity. This is just the behavior of the Schema Registry. You can find some lessons I created in how to use the registry: https://community.cloudera.com/t5/Community-Articles/Using-the-Schema-Registry-API/ta-p/286194 I also have an article showing how to fully automate the creation of Avro Schemas from CSV file (column name and data type) using the Schema Registry, Hive, and NiFI: https://community.cloudera.com/t5/Community-Articles/How-to-automate-creation-of-Avro-and-Hive-Schemas-using-NiFi/ta-p/293183 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎05-16-2020

@Genentech I am not sure if this is the answer you are looking for, but my recommendation is to leave your original table as is and select results from that into the parquet table. I am a firm believe of using backup copies, staging, copies or temporary copies of original data sources on the path through translation to final source. Make a new empty table with the parquet format you want. The format must match. Next execute: INSERT INTO final_table SELECT * from source_table; If you need to retain the same original table name, you can alter or drop the original table, and execute a rename statement on the final_table above. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎05-14-2020

@Regis Previously open source, the new versions of ambari (2.7.5) and HDP (3.1.5) have moved behind a paywall in an open core strategy. You cannot access them without cloudera subscription. I recommend to use the last free versions: Ambari 2.7.4 and HDP 3.1.4. You can find these repos here: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/bk_ambari-installation/content/ambari_repositories.html https://docs.cloudera.com/HDPDocuments/Ambari-2.7.4.0/bk_ambari-installation/content/hdp_314_repositories.html Also below is more information about the Authorization needed for Pay Walled Repos: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/bk_ambari-installation/content/access_ambari_paywall.html If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎05-13-2020

@satishjan1 The initial question is asking about setting the hostname. The information you reference is telling you to do that, but for a different operating system. My first response was telling you how to do it for RHEL. For your next question, you do not have to set the hostname in /etc/sysconfig/network, you have to do it the way required for your operating system. See Above. The hostname must be set, and persist after reboot. If you do not set the hostname before installing the cluster you will have unmentionable problems with services and components later on down the road.

stevenmatison · ‎05-12-2020

@satishjan1 the command to set the hostname is: hostnamectl set-hostname host.name.com Depending on your OS config, you may also need to update items that may manage that hostname and the host files. You can confirm by doing above, and then reboot. The hostname should persist after reboot. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks,

stevenmatison · ‎05-12-2020

@johndcal I am excited that you are learning NiFi. It is my favorite tool to solve any Use Case I have. Problem 1: Your extractText logic is matching all values, not one, and is put them into attributes as an array (0,1,2,3,4,etc). It's okay to use this in this manner, just use the values you need an ignore the rest. There are things you can do to single out each value better but I would recommend that you use a CSVRecordReader Controller Service to parse the CSV. This allows much more control over the values as well as the schema. Problem 2: Can you send your flow screenshot? You said it doesn't stop inserting the FlowFiles... is the first processor in your flow always on? What I mean by this, if that first processsor's run schedule is 0 sec, it will always run, and continuously generate FlowFiles. When I am creating a flow for first time, I set the first proc to some controllable run schedule. For example 30 seconds. I push play, then immediately stop it, then step the Flow Files through each downstream Processor, one at a time, testing at each queue that the FlowFiles attributes are as expected. Once I know I have a full operating flow, then I address how to trigger the flow to start, or the appropriate timing to always run. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks,

stevenmatison · ‎05-09-2020

@michaelli Not from the UI. You will need to go back to the actual metastore server and see the command(s) to create the hive user. If it’s a MySQL metastore it’s easy to go to MySQL prompt and arrow up to see previously executed commands with the hive user and password.

stevenmatison · ‎05-06-2020

@Udhav you will need to create permissions for the user to access the table you create in the metastore. Underlying cause: java.sql.SQLException : Access denied for user 'hive'@'localhost' (using password: YES) SQL Error code: 1045 For example mysql: CREATE DATABASE hive; CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive'; GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost' WITH GRANT OPTION; FLUSH PRIVILEGES; In Ambari Admin Hive Config Database tab and during Cluster Instal Wizard for Hive, there should be a Test Connection button for Hive Metastore. Use this feature to test the connection during install. Also just to make sure, there is also a requirement for mysql-connector for ambari: To use MySQL with Hive, you must download the https://dev.mysql.com/downloads/connector/j/ from MySQL. Once downloaded to the Ambari Server host, run: ambari-server setup --jdbc-db=mysql --jdbc-driver=/path/to/mysql/com.mysql.jdbc.Driver If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

stevenmatison · ‎05-04-2020

@arunnalpet yes that is correct. I wanted to avoid $set in basic testing and also advocating for valid code practice to avoid issues further with the data source. If the answer above resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ

Online	Offline
Last Visited	‎06-01-2022 03:47 PM

Name	Steven Matison
Location	Florida
Member Since	‎07-19-2018 04:45 PM
Last Visited	‎06-01-2022 03:47 PM
Posts	613
Kudos received	101

Cloudera Community

Re: Apache nifi - how to convert a file .txt into ...

Re: Apache Nifi - Using PutParquet, the HDFS file ...

Re: How to extract csv column record and used it f...

Re: Could not connect to Distributed Map Cache ser...

Re: NiFi InvokeHTTP POST JSON

Re: CSV with embedded JSON to JSON - NiFi

Re: AVRO Schema Registry Namespace

Re: Hive table storage format conversion from TEXT...

Re: HDP 3 public repo

Re: Security Recommendation Ambari 2.7.5 and RHEL ...

Re: Security Recommendation Ambari 2.7.5 and RHEL ...

Re: NiFi / Postgres Database Problems Inserting Re...

Re: how to check hive meta store db's password?

Re: Hive MetaStore always fails to start

Re: Text processing of FlowFiles content in Apache...