Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

Goals

  • Get basic understanding on how variables are set at HiveCLI and Beeline utilities
  • Retrieving values from table based on variabels
  • Creating views based on variables
  • Executing multiple statements using session variables via HiveCLI and Beeline
  • Manipulating variables at runtime within one session (possible only at hiveCLI)

HiveCLI

Variables can be set at the session level or while launching the hive shell.

  • Setting session variable while launching hive shell
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
hive> set myvar;
myvar=2016-01-01
hive>

OR

  • Setting the variable at the runtime within Hive CLI
hive> set myvar=2061-01-20
    > ;
hive> set myvar;
myvar=2061-01-20
hive>
  • Performing SELECTs based on variable to retrieve data
hive> select * from t_t1 limit 2;
OK
string_1	2016-01-01
string_2	2016-01-02
Time taken: 4.904 seconds, Fetched: 2 row(s)

hive> describe t_t1;
OK
f1                  	string
d1                  	date
Time taken: 0.62 seconds, Fetched: 2 row(s)

hive> set myvar=2016-01-02;
hive> select * from t_t1 where d1 = '${hiveconf:myvar}';
OK
string_2	2016-01-02
Time taken: 1.735 seconds, Fetched: 1 row(s)
  • Creating VIEWs based on tables using variables.
hive> create view v_t1 as select * from t_t1 where d1 = '${hiveconf:myvar}';
OK
Time taken: 2.823 seconds
hive> select * from v_t1;
OK
string_2	2016-01-02
Time taken: 0.699 seconds, Fetched: 1 row(s)
hive> set myvar;
myvar=2016-01-02
hive>

NOTE: Views are created only on static values, even if it is null or empty.

  • Using variables at session level to execute multiple statements.
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01 -e 'select * from t_t1 where d1="${hiveconf:myvar}";'
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 9.942 seconds, Fetched: 1 row(s)
  • Executing multiple statements at session level during runtime
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01 -e 'select * from t_t1 where d1="${hiveconf:myvar}"; select * from t_t1 where d1="${hiveconf:myvar}" AND d1 is not null limit 2'
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 7.965 seconds, Fetched: 1 row(s)
OK
string_1	2016-01-01
Time taken: 2.595 seconds, Fetched: 1 row(s
  • Manipulating variable with multiple values in one session
#Create a file query.sql with following entries

set myvar=2016-01-01;
select * from t_t1 where d1="${hiveconf:myvar}";
set myvar=2016-01-02;
select * from t_t1 where d1="${hiveconf:myvar}";

# Execute with "-f" option to execute a file via Hive Shell
[hive@sandbox ~]$ hive -f query.sql
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 7.969 seconds, Fetched: 1 row(s)
OK
string_2	2016-01-02
Time taken: 0.7 seconds, Fetched: 1 row(s)

Beeline

  • Setting session variable while launching the beeline client (Variables are represented using # in the JDBC url)
[hive@sandbox ~]$ beeline -u "jdbc:hive2://localhost:10000/default;#myvar=2016-01-01" -n hive -p ''
0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-01  |
+-------------------+--+
1 row selected (0.21 seconds)
  • Setting variable at Runtime is not an option with beeline
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-02;
Error: Error while processing statement: Cannot modify myvar at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)

NOTE: One can workaround this problem by setting the property "hive.security.authorization.sqlstd.confwhitelist.append" at customer hiveserver2-site via Ambari or in the xml file. For our variable, we can set this in the following way.

hive.security.authorization.sqlstd.confwhitelist.append=myvar

This change requires a restart. After restart of the services, a new beeline session will allow you to set the properties at runtime.

0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-03  |
+-------------------+--+
1 row selected (0.008 seconds)
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-04;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-04  |
+-------------------+--+
1 row selected (0.006 seconds)
0: jdbc:hive2://localhost:10000/default>
  • Performing SELECTs based on session variables;
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-01;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t_t1 where d1 = "${hiveconf:myvar}";
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_1  | 2016-01-01  |
+-----------+-------------+--+
1 row selected (1.61 seconds)

0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-03;
No rows affected (0.004 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t_t1 where d1 = '${hiveconf:myvar}';
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_3  | 2016-01-03  |
+-----------+-------------+--+
1 row selected (1.547 seconds)

NOTE:- To enforce the values set by you for the variable "myvar" above method can be used.

  • Creating VIEWS based on the variable (Same as in HiveCLI)
  • Executing multiple statements using variable at session level
[hive@sandbox ~]$ beeline -u "jdbc:hive2://localhost:10000/default;#myvar=2016-01-01" -n hive -p '' -e 'select * from t_t1 where d1 = "${myvar}"; select * from t_t1 where d1 != "${myvar}";'

Connecting to jdbc:hive2://localhost:10000/default;#myvar=2016-01-01
Connected to: Apache Hive (version 1.2.1.2.3.2.0-2950)
Driver: Hive JDBC (version 1.2.1.2.3.2.0-2950)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_1  | 2016-01-01  |
+-----------+-------------+--+
1 row selected (0.702 seconds)
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_2  | 2016-01-02  |
| string_3  | 2016-01-03  |
| string_4  | 2016-01-04  |
| string_5  | 2016-01-05  |
| string_6  | 2016-01-06  |
| string_7  | 2016-01-07  |
| string_8  | 2016-01-08  |
| string_9  | 2016-01-09  |
+-----------+-------------+--+
8 rows selected (1.595 seconds)
Beeline version 1.2.1.2.3.2.0-2950 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000/default;#myvar=2016-01-01
  • Multiple statements can be executed in the beeline by modifying the session variable at runtime in the same way as with Hive CLI

Try it yourselves

0: jdbc:hive2://localhost:10000/default> show create table t_t1;
+-----------------------------------------------------------------+--+
|                         createtab_stmt                          |
+-----------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `t_t1`(                                   |
|   `f1` string,                                                  |
|   `d1` date)                                                    |
| ROW FORMAT DELIMITED                                            |
|   FIELDS TERMINATED BY ','                                      |
| STORED AS INPUTFORMAT                                           |
|   'org.apache.hadoop.mapred.TextInputFormat'                    |
| OUTPUTFORMAT                                                    |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
| LOCATION                                                        |
|   'hdfs://sandbox.hortonworks.com:8020/Data2/testdata'          |
| TBLPROPERTIES (                                                 |
|   'COLUMN_STATS_ACCURATE'='false',                              |
|   'numFiles'='1',                                               |
|   'numRows'='-1',                                               |
|   'rawDataSize'='-1',                                           |
|   'totalSize'='162',                                            |
|   'transient_lastDdlTime'='1475686495')                         |
+-----------------------------------------------------------------+--+
18 rows selected (0.362 seconds)

0: jdbc:hive2://localhost:10000/default> insert into t_t1 values ('Name1','2016-01-01'), ('Name2','2016-01-02');
,
  • Goals
  • Get basic understanding on how variables are set at HiveCLI and Beeline utilities
  • Retrieving values from table based on variabels
  • Creating views based on variables
  • Executing multiple statements using session variables via HiveCLI and Beeline
  • Manipulating variables at runtime within one session (possible only at hiveCLI)

HiveCLI

Variables can be set at the session level or while launching the hive shell.

  • Setting session variable while launching hive shell
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
hive> set myvar;
myvar=2016-01-01
hive>

OR

  • Setting the variable at the runtime within Hive CLI
hive> set myvar=2061-01-20
    > ;
hive> set myvar;
myvar=2061-01-20
hive>
  • Performing SELECTs based on variable to retrieve data
hive> select * from t_t1 limit 2;
OK
string_1	2016-01-01
string_2	2016-01-02
Time taken: 4.904 seconds, Fetched: 2 row(s)

hive> describe t_t1;
OK
f1                  	string
d1                  	date
Time taken: 0.62 seconds, Fetched: 2 row(s)

hive> set myvar=2016-01-02;
hive> select * from t_t1 where d1 = '${hiveconf:myvar}';
OK
string_2	2016-01-02
Time taken: 1.735 seconds, Fetched: 1 row(s)
  • Creating VIEWs based on tables using variables.
hive> create view v_t1 as select * from t_t1 where d1 = '${hiveconf:myvar}';
OK
Time taken: 2.823 seconds
hive> select * from v_t1;
OK
string_2	2016-01-02
Time taken: 0.699 seconds, Fetched: 1 row(s)
hive> set myvar;
myvar=2016-01-02
hive>

NOTE: Views are created only on static values, even if it is null or empty.

  • Using variables at session level to execute multiple statements.
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01 -e 'select * from t_t1 where d1="${hiveconf:myvar}";'
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 9.942 seconds, Fetched: 1 row(s)
  • Executing multiple statements at session level during runtime
[hive@sandbox ~]$ hive -hiveconf myvar=2016-01-01 -e 'select * from t_t1 where d1="${hiveconf:myvar}"; select * from t_t1 where d1="${hiveconf:myvar}" AND d1 is not null limit 2'
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 7.965 seconds, Fetched: 1 row(s)
OK
string_1	2016-01-01
Time taken: 2.595 seconds, Fetched: 1 row(s
  • Manipulating variable with multiple values in one session
#Create a file query.sql with following entries

set myvar=2016-01-01;
select * from t_t1 where d1="${hiveconf:myvar}";
set myvar=2016-01-02;
select * from t_t1 where d1="${hiveconf:myvar}";

# Execute with "-f" option to execute a file via Hive Shell
[hive@sandbox ~]$ hive -f query.sql
Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-log4j.properties
OK
string_1	2016-01-01
Time taken: 7.969 seconds, Fetched: 1 row(s)
OK
string_2	2016-01-02
Time taken: 0.7 seconds, Fetched: 1 row(s)

Beeline

  • Setting session variable while launching the beeline client (Variables are represented using # in the JDBC url)
[hive@sandbox ~]$ beeline -u "jdbc:hive2://localhost:10000/default;#myvar=2016-01-01" -n hive -p ''
0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-01  |
+-------------------+--+
1 row selected (0.21 seconds)
  • Setting variable at Runtime is not an option with beeline
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-02;
Error: Error while processing statement: Cannot modify myvar at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)

NOTE: One can workaround this problem by setting the property "hive.security.authorization.sqlstd.confwhitelist.append" at customer hiveserver2-site via Ambari or in the xml file. For our variable, we can set this in the following way.

hive.security.authorization.sqlstd.confwhitelist.append=myvar

This change requires a restart. After restart of the services, a new beeline session will allow you to set the properties at runtime.

0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-03  |
+-------------------+--+
1 row selected (0.008 seconds)
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-04;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/default> set myvar;
+-------------------+--+
|        set        |
+-------------------+--+
| myvar=2016-01-04  |
+-------------------+--+
1 row selected (0.006 seconds)
0: jdbc:hive2://localhost:10000/default>
  • Performing SELECTs based on session variables;
0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-01;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t_t1 where d1 = "${hiveconf:myvar}";
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_1  | 2016-01-01  |
+-----------+-------------+--+
1 row selected (1.61 seconds)

0: jdbc:hive2://localhost:10000/default> set myvar=2016-01-03;
No rows affected (0.004 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t_t1 where d1 = '${hiveconf:myvar}';
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_3  | 2016-01-03  |
+-----------+-------------+--+
1 row selected (1.547 seconds)

NOTE:- To enforce the values set by you for the variable "myvar" above method can be used.

  • Creating VIEWS based on the variable (Same as in HiveCLI)
  • Executing multiple statements using variable at session level
[hive@sandbox ~]$ beeline -u "jdbc:hive2://localhost:10000/default;#myvar=2016-01-01" -n hive -p '' -e 'select * from t_t1 where d1 = "${myvar}"; select * from t_t1 where d1 != "${myvar}";'

Connecting to jdbc:hive2://localhost:10000/default;#myvar=2016-01-01
Connected to: Apache Hive (version 1.2.1.2.3.2.0-2950)
Driver: Hive JDBC (version 1.2.1.2.3.2.0-2950)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_1  | 2016-01-01  |
+-----------+-------------+--+
1 row selected (0.702 seconds)
+-----------+-------------+--+
|  t_t1.f1  |   t_t1.d1   |
+-----------+-------------+--+
| string_2  | 2016-01-02  |
| string_3  | 2016-01-03  |
| string_4  | 2016-01-04  |
| string_5  | 2016-01-05  |
| string_6  | 2016-01-06  |
| string_7  | 2016-01-07  |
| string_8  | 2016-01-08  |
| string_9  | 2016-01-09  |
+-----------+-------------+--+
8 rows selected (1.595 seconds)
Beeline version 1.2.1.2.3.2.0-2950 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000/default;#myvar=2016-01-01
  • Multiple statements can be executed in the beeline by modifying the session variable at runtime in the same way as with Hive CLI
65,085 Views
Comments

Shouldn't you be using the hivevar namespace everywhere instead of the hiveconf namespace?