Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4903 | 01-11-2021 05:54 AM | |
3341 | 01-11-2021 05:52 AM | |
8647 | 01-08-2021 05:23 AM | |
8161 | 01-04-2021 04:08 AM | |
36048 | 12-18-2020 05:42 AM |
08-13-2020
08:31 AM
@yangcm Below is the terminal output I used to test your create statements. You may need to do some additional hql for parquet formatting, but this should get you through to working with the tables: [root@c7301 ~]# sudo su - hdfs [hdfs@c7301 ~]$ hdfs dfs -mkdir -p /user/a/client [hdfs@c7301 ~]$ hdfs dfs -mkdir -p /user/a/mysql [hdfs@c7301 ~]$ hdfs dfs -chown -R hive:hive /user/a/ Then get to hive: sudo su - hive hive And run your creation statements. I did them as single inserts in my test but you can do them together as an HQL file too. The key here is that the locations must be owned by hive or the user executing the hive command. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://c7301.ambari.apache.org:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2 20/08/13 15:23:16 [main]: INFO jdbc.HiveConnection: Connected to c7301.ambari.apache.org:10000 Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0.3.1.0.0-78 by Apache Hive 0: jdbc:hive2://c7301.ambari.apache.org:2181/> CREATE EXTERNAL TABLE default.a1 ( . . . . . . . . . . . . . . . . . . . . . . .> client_id smallint, . . . . . . . . . . . . . . . . . . . . . . .> client_name varchar(255), . . . . . . . . . . . . . . . . . . . . . . .> client_address varchar(1234) . . . . . . . . . . . . . . . . . . . . . . .> ) . . . . . . . . . . . . . . . . . . . . . . .> PARTITIONED BY (month string) . . . . . . . . . . . . . . . . . . . . . . .> stored as parquet . . . . . . . . . . . . . . . . . . . . . . .> LOCATION '/user/a/client'; INFO : Compiling command(queryId=hive_20200813152342_b1597bb7-670b-4f18-9dbb-a9ee68cf1323): CREATE EXTERNAL TABLE default.a1 ( client_id smallint, client_name varchar(255), client_address varchar(1234) ) PARTITIONED BY (month string) stored as parquet LOCATION '/user/a/client' INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200813152342_b1597bb7-670b-4f18-9dbb-a9ee68cf1323); Time taken: 1.44 seconds INFO : Executing command(queryId=hive_20200813152342_b1597bb7-670b-4f18-9dbb-a9ee68cf1323): CREATE EXTERNAL TABLE default.a1 ( client_id smallint, client_name varchar(255), client_address varchar(1234) ) PARTITIONED BY (month string) stored as parquet LOCATION '/user/a/client' INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200813152342_b1597bb7-670b-4f18-9dbb-a9ee68cf1323); Time taken: 0.584 seconds INFO : OK No rows affected (2.923 seconds) 0: jdbc:hive2://c7301.ambari.apache.org:2181/> CREATE EXTERNAL TABLE default.a2 ( . . . . . . . . . . . . . . . . . . . . . . .> mysql_12 double, . . . . . . . . . . . . . . . . . . . . . . .> mysql_13 float . . . . . . . . . . . . . . . . . . . . . . .> ) . . . . . . . . . . . . . . . . . . . . . . .> PARTITIONED BY (month string) . . . . . . . . . . . . . . . . . . . . . . .> stored as parquet . . . . . . . . . . . . . . . . . . . . . . .> LOCATION '/user/a/mysql'; INFO : Compiling command(queryId=hive_20200813152356_48370efa-c51c-466f-8b2d-cf8ebb30d55b): CREATE EXTERNAL TABLE default.a2 ( mysql_12 double, mysql_13 float ) PARTITIONED BY (month string) stored as parquet LOCATION '/user/a/mysql' INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20200813152356_48370efa-c51c-466f-8b2d-cf8ebb30d55b); Time taken: 0.047 seconds INFO : Executing command(queryId=hive_20200813152356_48370efa-c51c-466f-8b2d-cf8ebb30d55b): CREATE EXTERNAL TABLE default.a2 ( mysql_12 double, mysql_13 float ) PARTITIONED BY (month string) stored as parquet LOCATION '/user/a/mysql' INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200813152356_48370efa-c51c-466f-8b2d-cf8ebb30d55b); Time taken: 0.084 seconds INFO : OK No rows affected (0.42 seconds) 0: jdbc:hive2://c7301.ambari.apache.org:2181/> describe default.a1 . . . . . . . . . . . . . . . . . . . . . . .> ; INFO : Compiling command(queryId=hive_20200813152409_da398c2e-edc1-4109-bb07-5577133b02aa): describe default.a1 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20200813152409_da398c2e-edc1-4109-bb07-5577133b02aa); Time taken: 0.372 seconds INFO : Executing command(queryId=hive_20200813152409_da398c2e-edc1-4109-bb07-5577133b02aa): describe default.a1 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200813152409_da398c2e-edc1-4109-bb07-5577133b02aa); Time taken: 0.295 seconds INFO : OK +--------------------------+----------------+----------+ | col_name | data_type | comment | +--------------------------+----------------+----------+ | client_id | smallint | | | client_name | varchar(255) | | | client_address | varchar(1234) | | | month | string | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | month | string | | +--------------------------+----------------+----------+ 8 rows selected (0.891 seconds) 0: jdbc:hive2://c7301.ambari.apache.org:2181/> describe default.a2; INFO : Compiling command(queryId=hive_20200813152426_a2916edf-e217-4b72-8398-6fd9d91dde60): describe default.a2 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20200813152426_a2916edf-e217-4b72-8398-6fd9d91dde60); Time taken: 0.293 seconds INFO : Executing command(queryId=hive_20200813152426_a2916edf-e217-4b72-8398-6fd9d91dde60): describe default.a2 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20200813152426_a2916edf-e217-4b72-8398-6fd9d91dde60); Time taken: 0.179 seconds INFO : OK +--------------------------+------------+----------+ | col_name | data_type | comment | +--------------------------+------------+----------+ | mysql_12 | double | | | mysql_13 | float | | | month | string | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | month | string | | +--------------------------+------------+----------+ 7 rows selected (0.778 seconds) 0: jdbc:hive2://c7301.ambari.apache.org:2181/> If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
08:11 AM
@JonnyL I would highly recommend that you back up and create a small 3 node nifi cluster to test this feature. Putting 2 nifi on single node, does not satisfy the test cases you really want to be experimenting with. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
08:07 AM
@gakroman What platform are you using Hue with (Cdx/HDx)? What does the metastore and user permissions look like? Without more information (you'll need to look under the hood at the hue logs) I think you need to look into permissions for the user you are using to the metastore. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
07:58 AM
@stevenmatison Yes got your point ! but when you create a hive table with varchar (sufficient number)Can the columns datatype changed from varchar to string automatically!? When I create a view out of that table, the datatype is getting changed to string.
... View more
08-13-2020
07:52 AM
@scotth1 You should be able to use QueryRecord processor with some advanced sql to extract what you need from any values in the underlying data result. Here is a great article about QueryRecord: https://community.cloudera.com/t5/Community-Articles/Running-SQL-on-FlowFiles-using-QueryRecord-Processor-Apache/ta-p/246671 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
07:44 AM
@devops there are some informations around the internet for work arounds making ambari think python 2 is really python 3, but the short answer is above. Please accept the answer. The workaround just creates more problems than it solves. Python 3 was never finished in Ambari, and it doesn't look like it will ever be improved for python 3. In the same high level conversation, the version of java supported is now similarly antiquated.
... View more
08-13-2020
06:51 AM
@stevenmatison Thanks .I used QueryRecord ,it helped to get count .
... View more
08-13-2020
05:11 AM
@ManuN Anyway you go about this task, you are going to have to execute the commands against the tables to get sizes. With a large number of tables this should be a script, program, or process. The common methods are to query the table with hive: -- gives all properties
show tblproperties yourTableName
-- show just the raw data size
show tblproperties yourTableName("rawDataSize") Or the most accurate is to look at the table location in HDFS: hdfs dfs -du -s -h /path/to/table There are also some methods to try and get this data directly from the Hive Metastore, assuming the table is an internal Hive table. In the past I have completed this with a basic bash/shell script. I have also done similar in NiFI and prefer to do it like this without coding. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
04:53 AM
@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-12-2020
06:53 AM
@Deenag Yes, this is a typical method to filter out flowfiles based on attributes matching expression language. You setup the routes you want and ignore the rest.
... View more