Member since
09-21-2015
28
Posts
40
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5373 | 12-02-2016 05:54 PM | |
1084 | 07-22-2016 04:12 PM | |
1730 | 04-22-2016 04:28 PM | |
31831 | 04-22-2016 07:58 AM | |
7483 | 10-08-2015 10:32 PM |
04-22-2016
04:28 PM
1 Kudo
You will need to specify the column you are clustering on, and then achieve it in multiple statements: CREATE TABLE emp1 LIKE emp;
ALTER TABLE emp1 SET FILEFORMAT ORC;
ALTER TABLE emp1 CLUSTERED BY (empId) INTO 4 BUCKETS;
... View more
04-22-2016
07:58 AM
2 Kudos
If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE. CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';
Once the data is mapped, you can convert it to other formats like parquet: set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;
For the hive optimized ORC format, the syntax is slightly different: CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;
... View more
12-16-2015
06:35 PM
The COUNT(DISTINCT) could be the bottleneck if it is not being parallelized. Can you share the explain plan ?
... View more
12-10-2015
05:43 PM
The work to generically create a table by reading a schema from orc, parquet and avro is tracked in HIVE-10593.
... View more
11-20-2015
10:03 PM
A mistyped hadoop fs -rmr -skipTrash can have catastrophic consequences, which can be protected against with snapshots. What are the performance concerns ?
... View more
11-17-2015
12:02 AM
1 Kudo
What size Nifi system would we need to read 400 MB/s from a Kafka topic and store the output in HDFS ? The input is log lines, 100 B to 1KB in length each.
... View more
Labels:
- Labels:
-
Apache NiFi
10-30-2015
11:26 PM
2 Kudos
Pig does not support appending to an existing partition through HCatalog. What workarounds are there to perform the append and get a behavior similar to Hive's INSERT INTO TABLE with Pig ?
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Pig
10-27-2015
08:58 PM
You would assign one folder to each of the datanode disks, closely mapping dfs.datanode.data.dir. On a 12 disk system you would have 12 yarn local-dir locations.
... View more
10-22-2015
08:34 PM
1 Kudo
In an HA environment, you should always refer to the nameservice, not any one of the namenodes. The syntax for the URL is hdfs://<nameservice>/ Notice that no port number is specified. The HA configuration should be defined in /etc/hadoop/conf/core-site.xml and accessible by the process. WebHDFS does not natively support Namenode HA but you can use Knox to provide the functionality.
... View more
10-10-2015
12:12 AM
1 Kudo
That was it. Thanks ! Used http://tweeterid.com/ to convert from username to user id.
... View more
- « Previous
-
- 1
- 2
- Next »