Member since
01-24-2019
49
Posts
4
Kudos Received
0
Solutions
11-16-2017
01:41 PM
1 Kudo
@Gayathri Devi Instead of cast you need to use from_unixtime Try the below query will result correct outputs as you are expecting hive# select from_unixtime(unix_timestamp('161223000001' ,'yyMMddhhmmss'), 'yyyy-MM-dd HH:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2016-12-23 00:00:01 |
+----------------------+--+
hive# select from_unixtime(unix_timestamp('161223000001' ,'yyMMddhhmmss'), 'yyyy-MM-dd hh:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2016-12-23 12:00:01 |
+----------------------+--+
... View more
11-10-2017
08:33 AM
1 Kudo
@Gayathri Devi, Assuming your initial data to be and Index as row key which will be increasing as new data is inserted Index, value ---> col names
1,a
2,b
3,c You can run normal sqoop command which will import the complete data to destination. Now let's say you have added few more rows to the source and your input becomes like this Index, value ---> col names
1,a
2,b
3,c
4,d
5,e
6,f Now you can use sqoop incremental command to import the new columns . You can use "--incremental<mode> --check-column<column name> --last-value<last check column value>" ie "--incremental append --check-column Index --last-value 3". This command will only import the last 3 rows You can also do incremental import based on lastmodified value. https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports Thanks, Aditya
... View more
11-09-2017
02:35 AM
@Gayathri Devi, Please use the below query format --query "select * hgj where date(starttime)=2017-08-08 AND \$CONDITIONS" Thanks, Aditya
... View more
11-08-2017
03:33 AM
2 Kudos
@Gayathri Devi Pleas echeck your Oozie Sharedlib if you have placed the mysql-connector-java jar inside it? # oozie admin -oozie http://$OOZIE_HOST:11000/oozie -shareliblist Also please check if you have imported the Mysql connector jDBC driver inside the "/user/oozie/share/lib/*/sqoop" and have run the "sharelibupdate" command? # oozie admin -oozie http://$HOSTNAME:11000/oozie -sharelibupdate
.
... View more
11-07-2017
02:24 AM
1 Kudo
Shu Thanks !!!
... View more
11-02-2017
02:41 AM
3 Kudos
@Gayathri Devi First you need to create a hive non partition table on raw data. Then you need to create partition table in hive then insert from non partition table to partition table. For testing i have tried an example as below:- Right now my hive normal table(i.e not partition table) having these list of records. Normal table(without partition column):- hive# create table text_table(id int, dt string,name string) stored as textfile location '/user/yashu/text_table';
hive# select * from text_table;
+----------------+----------------------+------------------+--+
| text_table.id | text_table.dt | text_table.name |
+----------------+----------------------+------------------+--+
| 1 | 2017-10-31 10:12:09 | foo |
| 1 | 2017-10-31 12:12:09 | bar |
| 1 | 2017-10-30 12:12:09 | foobar |
| 1 | 2017-10-30 10:12:09 | barbar |
+----------------+----------------------+------------------+--+ Then i want to do daily partition table for this case i need to create a new table having dt as partition column in it Partition table:- There are 2 kinds of partitions in hive 1.Static partitions //adding partition statically and loading data into it,takes less time than dynamic partitions as it won't need to look into data while creating partitions.
2.Dynamic partitions //creating partitions dynamically based on the column value, take more time than static partitions if data is huge because it needs to look into data while creating partitions.
hive# create table partition_table(
id int,
name string)
partitioned by (dt string); 1.Dynamic Partition:- once you create partition table then select from non partition table, hive# insert into partition_table partition(dt) select id,name, substring(dt,0,10) from text_table; //we need to have daily partition so i'm doing sub string from 0-10 i.e 2017-10-31 so this will create date partitions
INFO : Time taken to load dynamic partitions: 0.066 seconds
INFO : Loading partition {dt=2017-10-30} //creating 2017-10-30 partition
INFO : Loading partition {dt=2017-10-31} //creating 2017-10-30 partition
INFO : Time taken for adding to write entity : 0
INFO : Partition default.partition_table{dt=2017-10-30} stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16]
INFO : Partition default.partition_table{dt=2017-10-31} stats: [numFiles=1, numRows=2, totalSize=12, rawDataSize=10]
No rows affected (10.055 seconds) We are doing dynamic partitions in our above statement i.e we are creating partition based on our data. if you want to view the partitions then give hive# show partitions partition_table; //we can view all partitions that has create in the table.
+----------------+--+
| partition |
+----------------+--+
| dt=2017-10-30 |
| dt=2017-10-31 |
+----------------+--+
2 rows selected (0.064 seconds) Drop partitions:- hive# alter table partition_table drop partition(dt>'0') purge; //it will drop all the partitions (or) you can drop specific partition by mentioning as dt='2017-10-30'(it will drop only 2017-10-30 partition)
INFO : Dropped the partition dt=2017-10-30
INFO : Dropped the partition dt=2017-10-31
No rows affected (0.132 seconds) To view all partition directories information hadoop fs -ls -R /apps/hive/warehouse/partition_table/
drwxrwxrwx - hdfs 0 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-30 //partition directory
-rwxrwxrwx 3 hdfs 18 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-30/000000_0 //file in the partition
drwxrwxrwx - hdfs 0 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-31
-rwxrwxrwx 3 hdfs 12 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-31/000000_0 To view data from one partition select * from partition_table where dt='2017-10-30';
+---------------------+-----------------------+---------------------+--+
| partition_table.id | partition_table.name | partition_table.dt |
+---------------------+-----------------------+---------------------+--+
| 1 | foobar | 2017-10-30 |
| 1 | barbar | 2017-10-30 |
+---------------------+-----------------------+---------------------+--+ As you can see out dt column in non partitioned table having 2017-10-30 12:12:09 but in partition table having 2017-10-30 because as we are loading the data to partition table we did sub string on dt column. --> if you don't want to change the source data i.e dt column from non partition table to partition table then create partition table with hive# create table partition_table(
id int,
name string,
dt string)
partitioned by (daily string); //new partition column
hive# insert into partition_table partition(daily) select id,name,dt, substring(dt,0,10) from text_table; //we are having daily as partition column and in select statement we have used dt column twice one is to load actual dt column data and second one is to create partition column.
show partitions partition_table;
+-------------------+--+
| partition |
+-------------------+--+
| daily=2017-10-30 |
| daily=2017-10-31 |
+-------------------+--+
2 rows selected (0.066 seconds)
0: jdbc:hive2://usor7dhc01w01.use.ucdp.net:21> select * from partition_table; //as you can see we haven't changed dt column data as we have new daily column as partition column
+---------------------+-----------------------+----------------------+------------------------+--+
| partition_table.id | partition_table.name | partition_table.dt | partition_table.daily |
+---------------------+-----------------------+----------------------+------------------------+--+
| 1 | foobar | 2017-10-30 12:12:09 | 2017-10-30 |
| 1 | barbar | 2017-10-30 10:12:09 | 2017-10-30 |
| 1 | foo | 2017-10-31 10:12:09 | 2017-10-31 |
| 1 | bar | 2017-10-31 12:12:09 | 2017-10-31 |
+---------------------+-----------------------+----------------------+------------------------+--+ **keep in mind partition column needs to be last column in your select statement, if not hive creates partitions based on what ever the last column is in your select statement. 2.Static partition:- We are statically creating partition and loading all the data into that partition, hive# insert into partition_table partition(dt='2017-10-30') select id,name from text_table; //we are mentioned partition name here as dt='2017-10-30' so all data will be loaded into 2017-10-30 partition if you are doing static partition that means all the dt data should be 2017-10-30 and you can view we haven't mentioned dt column in select statement. hive# show partitions partition_table;
+----------------+--+
| partition |
+----------------+--+
| dt=2017-10-30 |
+----------------+--+
hive# select * from partition_table; //all dt will be 2017-10-30 because we are doing static partition column
+---------------------+-----------------------+---------------------+--+
| partition_table.id | partition_table.name | partition_table.dt |
+---------------------+-----------------------+---------------------+--+
| 1 | foo | 2017-10-30 |
| 1 | bar | 2017-10-30 |
| 1 | foobar | 2017-10-30 |
| 1 | barbar | 2017-10-30 |
+---------------------+-----------------------+---------------------+--+
As you need to decide which kind of partitions are best fit for your case. Hope this will help you to understand about partitions..!!
... View more
10-11-2017
01:08 PM
1 Kudo
Hi @Gayathri Devi, You can use spark sql to get data from Hive table and create a dataframe. There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine You can refer to the below links how to get data directly from HBase without using Hive table. https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/ https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ https://github.com/hortonworks-spark/shc
... View more
10-10-2017
01:38 PM
2 Kudos
Hi @Gayathri Devi You don't have to mention any compression format property in create Hive table statement. Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default. Just create table statement without compression formats property like below, CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname"); Example:- i have created a HBase table with snappy compression and i put 3 records to it then scanned the table. hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' }
hbase(main)#put 'tbl_snp','1','cf:name','hcc'
hbase(main)#put 'tbl_snp','2','cf:name','hdp'
hbase(main)#put 'tbl_snp','3','cf:name','hdf'
hbase(main)#scan 'tbl_snp'
ROW COLUMN+CELL
1 column=cf:name, timestamp=1507641820083, value=hcc
2 column=cf:name, timestamp=1507641848288, value=hdp
3 column=cf:name, timestamp=1507641855165, value=hdf
3 row(s) in 0.0190 seconds Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table Create Table Statement:- create external table default.tbl_snp(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp"); select * from default.tbl_snp;
+-------------+---------------+--+
| tbl_snp.id | tbl_snp.name |
+-------------+---------------+--+
| 1 | hcc |
| 2 | hdp |
| 3 | hdf |
+-------------+---------------+--+
3 rows selected (0.876 seconds) i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.
... View more
10-06-2017
02:34 AM
1 Kudo
@Gayathri Devi, We cannot create Hive-Hbase table in Avro format. As Hive is just a wrapper on top of HBase table. HBase having all the data got stored and we are mapping fields to Hive table and just exposing data in Structured manner from Hive. If you want to create Avro (or) orc format table then you can prepare a snapshot table and use them for your needs. Create table default.avro_table stored as avro as select * from hive_hbase_tablename; Create table default.orc_table stored as orc as select * from hive_hbase_tablename; In this way you can create Avro (or) orc tables for Hive-Hbase tables and you can use where clause to get only the required data from hive_hbase_table.
... View more
- « Previous
-
- 1
- 2
- Next »