Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

what kind of data type i am going to use as a timestamp while inserting date as a value in hive and elasticsearch

avatar

i get the twitter data and put those data into hdfs using nifi,it gives a data,but my date formate is in Tue Dec 20 10:04:31 +0000 2016 and i create a hive table with those twitter data.when i create that table i have created_time field and i need to set datatype as date,it shows an null value then i change the datatype as string it shows an value.then i need to transfer a that table into elasticsearch using es-hadoop but i confused what datatype i am going to use as time field.

i tried string to that time filed,it shows a value but when i visualize that field in kibana,in show an error a date type is required,

my query is given below:

i also tried ->change my created_time field as date but i shows a null value

10736-screenshot-from-2016-12-23-173539.png

10734-screenshot-from-2016-12-23-172821.png

 create table if not exists tweets_text_partition(
  tweet_id bigint, 
  created_unixtime bigint, 
  created_time string, 
  displayname string, 
  msg string,
  fulltext string
)
row format delimited fields terminated by "|"
location "/tmp/tweets_staging";
*******************************************************************
curl -XPUT http://sandbox.hortonworks.com:9200/twitter_text1/?pretty -d'{"rels":{"properties":{"tweet_id":{"type":"bigint"},"created_unixtime":{"type":"bigint"},"created_time":{"type":"string"},"displayname":{"type":"string"},"msg":{"type":"string"},"fulltext":{"type":"string"}}}}'

***********************************************************************
ADD JAR /root/ elasticsearch-hadoop-5.1.1.jar;

ADD JAR /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar;
**********************************************************************************
CREATE EXTERNAL TABLE test.twitter_es1
(
  tweet_id bigint, 
  created_unixtime bigint, 
  created_time string, 
  displayname string, 
  msg string,
  fulltext string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.resource' = 'twitterdata/data',
'es.nodes'='sandbox.hortonworks.com:9200',
'es.index.auto.create' = 'false',
'es.mapping.id'='tweet_id');
*************************************************************
INSERT OVERWRITE TABLE test.twitter_es1
SELECT
tweet_id, 
created_unixtime, 
created_time, 
displayname, 
msg,
fulltext
FROM
test.tweets_hive1;

screenshot-from-2016-12-23-173530.png
1 ACCEPTED SOLUTION

avatar
Guru

The most direct way is to transform the date to correct format in NiFi. Alternatively, you could land it in a hive table and CTAS to a new table while transforming to correct format. See this for Hive timestamp format to be used in either case: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp

NiFi: Before putting to hdfs or hive, use a ReplaceText processor. You will use regex to find the timestamp pattern from original twitter json and replace it with the timestamp pattern needed in Hive/Kibana. This article should help you out: https://community.hortonworks.com/articles/57803/using-nifi-gettwitter-updateattributes-and-replace....

Hive alternative: Here you either use a SerDe to transform the timestamp or you use regex. In both cases, you land the data in a Hive table, then CTAS (Create Table as Select) to a final table. This should help you out for this approach: https://community.hortonworks.com/questions/19192/how-to-transform-hive-table-using-serde.html

To me, the NiFi approach is superior (unless you must store the original with untransformed date into Hadoop).

View solution in original post

1 REPLY 1

avatar
Guru

The most direct way is to transform the date to correct format in NiFi. Alternatively, you could land it in a hive table and CTAS to a new table while transforming to correct format. See this for Hive timestamp format to be used in either case: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp

NiFi: Before putting to hdfs or hive, use a ReplaceText processor. You will use regex to find the timestamp pattern from original twitter json and replace it with the timestamp pattern needed in Hive/Kibana. This article should help you out: https://community.hortonworks.com/articles/57803/using-nifi-gettwitter-updateattributes-and-replace....

Hive alternative: Here you either use a SerDe to transform the timestamp or you use regex. In both cases, you land the data in a Hive table, then CTAS (Create Table as Select) to a final table. This should help you out for this approach: https://community.hortonworks.com/questions/19192/how-to-transform-hive-table-using-serde.html

To me, the NiFi approach is superior (unless you must store the original with untransformed date into Hadoop).