Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what kind of data type i am going to use as a timestamp while inserting date as a value in hive and elasticsearch

avatar
Contributor

i get the twitter data and put those data into hdfs using nifi,it gives a data,but my date formate is in Tue Dec 20 10:04:31 +0000 2016 and i create a hive table with those twitter data.when i create that table i have created_time field and i need to set datatype as date,it shows an null value then i change the datatype as string it shows an value.then i need to transfer a that table into elasticsearch using es-hadoop but i confused what datatype i am going to use as time field.

i tried string to that time filed,it shows a value but when i visualize that field in kibana,in show an error a date type is required,

my query is given below:

i also tried ->change my created_time field as date but i shows a null value

10736-screenshot-from-2016-12-23-173539.png

10734-screenshot-from-2016-12-23-172821.png

 create table if not exists tweets_text_partition(
  tweet_id bigint, 
  created_unixtime bigint, 
  created_time string, 
  displayname string, 
  msg string,
  fulltext string
)
row format delimited fields terminated by "|"
location "/tmp/tweets_staging";
*******************************************************************
curl -XPUT http://sandbox.hortonworks.com:9200/twitter_text1/?pretty -d'{"rels":{"properties":{"tweet_id":{"type":"bigint"},"created_unixtime":{"type":"bigint"},"created_time":{"type":"string"},"displayname":{"type":"string"},"msg":{"type":"string"},"fulltext":{"type":"string"}}}}'

***********************************************************************
ADD JAR /root/ elasticsearch-hadoop-5.1.1.jar;

ADD JAR /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar;
**********************************************************************************
CREATE EXTERNAL TABLE test.twitter_es1
(
  tweet_id bigint, 
  created_unixtime bigint, 
  created_time string, 
  displayname string, 
  msg string,
  fulltext string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.resource' = 'twitterdata/data',
'es.nodes'='sandbox.hortonworks.com:9200',
'es.index.auto.create' = 'false',
'es.mapping.id'='tweet_id');
*************************************************************
INSERT OVERWRITE TABLE test.twitter_es1
SELECT
tweet_id, 
created_unixtime, 
created_time, 
displayname, 
msg,
fulltext
FROM
test.tweets_hive1;

screenshot-from-2016-12-23-173530.png
1 ACCEPTED SOLUTION

avatar
Guru

The most direct way is to transform the date to correct format in NiFi. Alternatively, you could land it in a hive table and CTAS to a new table while transforming to correct format. See this for Hive timestamp format to be used in either case: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp

NiFi: Before putting to hdfs or hive, use a ReplaceText processor. You will use regex to find the timestamp pattern from original twitter json and replace it with the timestamp pattern needed in Hive/Kibana. This article should help you out: https://community.hortonworks.com/articles/57803/using-nifi-gettwitter-updateattributes-and-replace....

Hive alternative: Here you either use a SerDe to transform the timestamp or you use regex. In both cases, you land the data in a Hive table, then CTAS (Create Table as Select) to a final table. This should help you out for this approach: https://community.hortonworks.com/questions/19192/how-to-transform-hive-table-using-serde.html

To me, the NiFi approach is superior (unless you must store the original with untransformed date into Hadoop).

View solution in original post

1 REPLY 1

avatar
Guru

The most direct way is to transform the date to correct format in NiFi. Alternatively, you could land it in a hive table and CTAS to a new table while transforming to correct format. See this for Hive timestamp format to be used in either case: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp

NiFi: Before putting to hdfs or hive, use a ReplaceText processor. You will use regex to find the timestamp pattern from original twitter json and replace it with the timestamp pattern needed in Hive/Kibana. This article should help you out: https://community.hortonworks.com/articles/57803/using-nifi-gettwitter-updateattributes-and-replace....

Hive alternative: Here you either use a SerDe to transform the timestamp or you use regex. In both cases, you land the data in a Hive table, then CTAS (Create Table as Select) to a final table. This should help you out for this approach: https://community.hortonworks.com/questions/19192/how-to-transform-hive-table-using-serde.html

To me, the NiFi approach is superior (unless you must store the original with untransformed date into Hadoop).