Member since
03-24-2017
6
Posts
0
Kudos Received
0
Solutions
06-20-2018
12:48 AM
Thanks Vinicius. It worked.
... View more
06-20-2018
12:47 AM
We are seeing these alert daily on our cluster in Ambari. Is there a way to resolve this issue. I see because of too many queries submitted to cluster, this alert has been triggered. We have restarted head node all services, still alert remains. Is there a way to cleanup alert or clean up name node .
... View more
Labels:
06-20-2018
12:45 AM
We are keep getting this kind of alert. Because of too many queries submitted to cluster, we are getting this kind of alerts. Even after restarting name node, these alerts are not going. Is there a way to clean up these alerts.
... View more
Labels:
06-03-2018
06:59 AM
Hi, We are copying files from our upstream system which are in JSON GZ format. They are following a pattern for very daily slice say YYYYMMDDHH (2018053100) they are maintianing two folders DATA and METADATA. Where DATA holds actual data and METADATA holds RowCount of that day's data.We need to create external table on top of copied data, where it only consider *.json.gz extension files only, excluding other file extensions. We dont want to copy files into another location since they are large in size. We also tried INPUT_ FILENAME .... virtual column, it didn't work. Any suggesstion for this scnearios ?
... View more
- Tags:
- Data Processing
- Hive
Labels:
05-01-2018
11:08 AM
Hi, Can any one faced issue while fetching data from external table. We are copying data from upstream system into our storage S3. As part of copy, directories along with Zero bytes files are been copied. Source File Format is in JSON format and Compress (Gz) . Below is Folder
Hierarchy Structure DATE --> <Folder> <DAY=201803250> ---> Folder 1.json.gz --> File 2.json.gz <DAY=201803250> ---> Empty Zero Bytes Files. Please find below screenshot We are trying to create external table with JSON Serde. ADD JAR
wasb://jsonserde@XYZ.blob.core.windows.net/json/json-serde-1.3.9.jar;
SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
SET hive.merge.mapfiles = true;
SET hive.merge.mapredfiles = true;
SET hive.merge.tezfiles = true;
DROP TABLE IF EXISTS Ext_STG1;
CREATE EXTERNAL TABLE Ext_STG1(Col1 String, Col2 String, Col3 String) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("case.insensitive" = "true", "ignore.malformed.json" = "true")
STORED AS TEXTFILE LOCATION 'wasb://container1@xyz.blob.core.windows.net/date/day=201803250/'
TBLPROPERTIES ('serialization.null.format' = '');
select * from Ext_STG1 limit 100;
... View more
Labels:
03-24-2017
09:19 PM
Hi, I hope you're doing good. I'm seeking your help on queue
configuration. We have configured our Capacity Scheduler Queues
has per link.
we need your help on configuring cluster in such a way that both
Larger Query and Smaller Query be executed same time. Today
whenever user submits large query which deal with GB of data,
causes application to consume all the free resources capacity
which blocks other smaller query. How does other big data projects
handles this scenario. We have setup two queues, but is their way
in Templeton to submit query to particular Queue ?. I have
scenario like below
We have two Queues Q1 and Q2 each with 50% of cluster resources.
We submit query to Hive through HiveServer2 and WEBHCAT
(Templeton). When i submit my query to HiveServer2, it is making
use of Q1 Queue capacity using HiveServer2 config. Is there a way
or some setting which makes query submitted through WEBHCAT goes
to Q2 queue capacity only or is there any command like CURL which
can accept parameter i.e WEBHCAT REST API parameter to which queue
the query needs to be submitted. because we are seeing one big
query block others.. how to improve concurrency? Thanks in advance
... View more