Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4033 | 08-20-2018 08:26 PM | |
| 1933 | 08-15-2018 01:59 PM | |
| 2365 | 08-13-2018 02:20 PM | |
| 4095 | 07-23-2018 04:37 PM | |
| 5003 | 07-19-2018 12:52 PM |
06-15-2016
01:31 AM
1 Kudo
@Vijay Parmar If I understood you correctly, you are parsing a file-->performing some ETL--> storing into hive. If my understanding is correctly I recommend you do this in storm and stream into hive using hive streaming. Ingest data from teradata--> bolt access the url and fetch json --> bolt to receive json and fetch access another URL returning json --> bolt which is the hive streaming bolt to persist the data to hive. How that helps Here is a little about hive streaming: Hive HCatalog Streaming API Traditionally adding new data into Hive requires gathering a large amount of data onto HDFS and then periodically adding a new partition. This is essentially a “batch insertion”. Insertion of new data into an existing partition is not permitted. Hive Streaming API allows data to be pumped continuously into Hive. The incoming data can be continuously committed in small batches of records into an existing Hive partition or table. Once data is committed it becomes immediately visible to all Hive queries initiated subsequently. This API is intended for streaming clients such as Flume and Storm, which continuously generate data. Streaming support is built on top of ACID based insert/update support in Hive (see Hive Transactions). The Classes and interfaces part of the Hive streaming API are broadly categorized into two sets. The first set provides support for connection and transaction management while the second set provides I/O support. Transactions are managed by the metastore. Writes are performed directly to HDFS. Streaming to unpartitioned tables is also supported. The API supports Kerberos authentication starting in Hive 0.14. Note on packaging: The APIs are defined in the Java package org.apache.hive.hcatalog.streaming and part of the hive-hcatalog-streaming Maven module in Hive.
... View more
06-14-2016
09:22 PM
@Bruce Perez how about using COALESCE? Returns the first v that is not NULL, or NULL if all v's are NULL. SELECT COALESCE(datefield1, datefield2, datefield3) as first_date_found
FROM
tblDates
WHERE
primary_key = 1
... View more
06-14-2016
12:47 PM
1 Kudo
As a next step you will need to create a table with orc format, fill the table with your joined dat using insert into select ..., then update it using method i have described..
... View more
06-14-2016
04:36 AM
if your table is not in orc format, then create another table just like the one you have today like this:
CREATE TABLE ... STORED AS ORC ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT ORC SET hive.default.fileformat=Orc then insert into this table from your existing table. you can use statement INSERT INTO TABLE tablename1
... View more
06-14-2016
04:31 AM
@Bruce Perez If your data is in ORC format this can be done by simple performing a update statement on your table. INSERT ... VALUES, UPDATE, and DELETE SQL statements are supported in Apache Hive 0.14 and later. The INSERT ... VALUES statement enable users to write data to Apache Hive from values provided in SQL statements. The UPDATE and DELETE statements enable users to modify and delete values already written to Hive. All three statements support auto-commit, which means that each statement is a separate transaction that is automatically committed after the SQL statement is executed. More information available here.
... View more
06-13-2016
07:59 PM
@mrizvi Do you mind opening a seperate HCC post on your question?
... View more
06-13-2016
07:44 PM
@Todd Wilkinson I would use replacetext processor. more info here. Updates the content of a FlowFile by evaluating a Regular Expression (regex) against it and replacing the section of the content that matches the Regular Expression with some alternate value. you would search for the value and replace with output $1, $2, etc. You can also use replaceTextWithMapping. Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.
... View more
06-10-2016
05:26 AM
@sameer lail I do want to inform you that hdfs is not a posix file system. data is stored in blocks and then split onto the data nodes. The name node has information about all the files, all the data blocks which make up the file. So when you use hadoop fs to do file level actions.
... View more
06-10-2016
05:23 AM
2 Kudos
@sameer lail take a look at your hdfs-defaults.xml and look at the directory setting for dfs.data.dir. this is where you hdfs files are stored. You can also view this setting on ambari under hdfs tab under configuration.
... View more
06-10-2016
05:12 AM
@Rahul Pathak My ? may not be applicable due to my mis-understanding kafka storing logs on HDFS as well as local. Do i understand you correctly that kafka only stores logs to local disk?
... View more