Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13459 | 02-20-2018 12:33 PM | |
1526 | 02-19-2018 05:12 AM | |
1877 | 12-28-2017 06:13 AM | |
7177 | 09-28-2017 09:25 AM | |
12215 | 09-25-2017 11:19 AM |
06-14-2017
03:23 PM
@Guillaume Roger I'm not sure whether my understanding is correct based on your reply. If you have compound keys then there are work around available to make it possible. Load the data with concat(compound keys) along with the separate fields into a stage table. For the stage table you have the option of defining hte primary key as well as partition based on the other fields which are used in a compound key creation.
... View more
06-14-2017
11:45 AM
Hi @Guillaume Roger I don't think we can create partition of primary column. To add few things on top of it, if you create partition based on primary key then there will be only one record placed under each partition which will end up in 'N' of partitions. Suppose if you have 10K records then it will be chaos with that much partition on primary keys. Hope it helps!
... View more
06-09-2017
02:49 PM
Nikkie Thomas If I partition the data by yyyy-mm-dd field and I receive only one file per day. I assume , I will always have one file per partition irrespective of this setting? --> Its not that simple, because it depends on the size of your input file, block size, size of mapper /reducer an other variables. Considering your input file is less than the block size then it should create only one file. If you partition the table on a daily basis with less size then in growth of time it will cause performance issues and there is not much to do with partition. What I would say on such condition, is that partition the table on yearly basis with buckets on a frequently used filter column. In your case it can be daily/weekly/yearly basis. But still each file in a bucketed folder will be less if the data size is less.
... View more
06-09-2017
11:31 AM
1 Kudo
Hi Nikkie Thomas To control the no of files inserted in hive tables we can either change the no of mapper/reducers to 1 depending on the need, so that the final output file will always be one. If not anyone of the below things should be enable to merge a reducer output if the size is less than an block size.
hive.merge.mapfiles -- Merge small files at the end of a map-only job. hive.merge.mapredfiles -- Merge small files at the end of a map-reduce job. hive.merge.size.per.task -- Size of merged files at the end of the job. hive.merge.smallfiles.avgsize -- When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
... View more
06-08-2017
02:15 PM
Félicien Catherin Could please share the screen shot with error after executing this code. CREATE TABLE FIREWALL_LOGS( time STRING, ip STRING, country STRING, status INT ) CLUSTERED BY (time) into 25 buckets STORED AS ORC ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' TBLPROPERTIES("transactional"="true");
... View more
06-08-2017
01:54 PM
CREATE TABLE FIREWALL_LOGS(
time STRING,
ip STRING,
country STRING,
status INT
)
CLUSTERED BY (time) into 25 buckets
STORED AS ORC
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION '/tmp/server-logs'
TBLPROPERTIES("transactional"="true"); Missed the location in the previous answer.
... View more
06-08-2017
01:52 PM
Félicien Catherin Please use the below DDL. CREATE TABLE FIREWALL_LOGS(
time STRING,
ip STRING,
country STRING,
status INT
)
CLUSTERED BY (time) into 25 buckets
STORED AS ORC
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
TBLPROPERTIES("transactional"="true")
... View more
06-08-2017
12:04 PM
2) HDFS clent act as :- staging/intermediate layer for DN and NM. --> Does it mean whenever I'm copying a file from local to HDFS, edge node will act as a staging layer using the HDFS client which is also installed in edge node? In turn worker node doesn't have any role to play here. Is my understanding right?
... View more
06-08-2017
11:52 AM
Hi Félicien Catherin You have missed row format delimited. Please use the below in your DDL. ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' It should work. I hope it helps.
... View more
06-08-2017
07:48 AM
1 Kudo
I'm copying a file from unix server to HDFS. I believe Edge node will act as a gateway for ingest data into HDFS. Consider I have 5 GB of file which I'm trying to copy into HDFS. Where will the data be stored? I understand that it will be stored in the data node. But before the entire file is placed into a data node, it will be placed in staging/intermediate layer. Will edge node holds the place for that staging layer?
... View more
Labels:
- Labels:
-
Apache Hadoop