Member since
05-16-2016
785
Posts
114
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1866 | 06-12-2019 09:27 AM | |
3070 | 05-27-2019 08:29 AM | |
5109 | 05-27-2018 08:49 AM | |
4480 | 05-05-2018 10:47 PM | |
2782 | 05-05-2018 07:32 AM |
03-13-2017
11:08 AM
1 Kudo
1) Since snappy is not too good at compression (disk), what would be the difference on disk space for a 1 TB table when stored as parquet only and parquet with snappy compression. I created three table with different senario . please take a peek into it . It will give you some idea. TABLE 1 - No compression parquet format +-------+--------+--------+---------+
| #Rows | #Files | Size | Format |
+-------+--------+--------+---------+
| -1 | 4 | 3.73MB | PARQUET |
+-------+--------+--------+---------+ TABLE 2 : TEXT FORMAT with default compression Snappy +-------+--------+---------+--------+
| #Rows | #Files | Size | Format |
+-------+--------+---------+--------+
| 0 | 8 | 22.04MB | TEXT |
+-------+--------+---------+--------+ TABLE 3 - With parquet + compression enabled as Snappy +-------+--------+--------+---------+
| #Rows | #Files | Size | Format |
+-------+--------+--------+---------+
| -1 | 4 | 3.71MB | PARQUET |
+-------+--------+--------+---------+ 2) Is it possible to compress a non-compressed parquet table later with snappy? Alter table is a logical operation that updates the table metadata in the metastore database. however you can fire a CTAS perform the compression and rename if you want using alter table d1.X rename to Y;
... View more
03-13-2017
10:39 AM
there is typo in the configuration . agent.sinks.agent-sink.channels = agent-chan to agent.sinks.agent-sink.channel = agent-chan
... View more
03-13-2017
10:35 AM
1 Kudo
There are few things that needs to be take care when dealing with flume configuration. when u define source . agent.sources = sr1 when u define sink agent.sinks = sink1 sink2 ... when u define channels agent.channels = ch1 ch1 in your configuration there is a typo . agent.sinks.agent-sink.channels = agent-chan change it to agent.sinks.agent-sink.channel = agent-chan You can configure an agent with zero or more sinks , but each sink can read events exactly from one channel . also you have to configure one channel for sink , if not it will be removed.
... View more
03-08-2017
06:43 PM
Indeed . To sum up , the below stated are the default compression codec - Hive - default Compression is DeflateCodec
Impala - default Compression is Snappy Thanks mate
... View more
03-08-2017
12:05 PM
I think snappy by default . refer this link - https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_parquet.html Could you please correct me if I am wrong . Thanks
... View more
03-08-2017
10:50 AM
1) If we create a table (both hive and impala)and just specify stored as parquet . Will that be snappy compressed by default in CDH? Currently the default compression is - Snappy with Impala tables. 2) If not how do i identify a parquet table with snappy compression and parquet table without snappy compression?. describe formated tableName Note - but you will always see the compression as NO because the compression data format is not stored in metadata of the table , the best way is to do dfs -ls -r to the table location and see the file format for compression. 3) Also how to specify snappy compression for table level whiel creating and also at global level, even if nobody specified at table level (all table stored as parquet should be snappy compressed). CREATE TABLE external_parquet (c1 INT, c2 STRING)
STORED AS PARQUET LOCATION ' ' or Session basis SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK; Globally - i,e file is executed when you launch the hive shell Put the above in location in CDH /etc/hive/conf.cloudera.hive1 if dont find one you can always create .hiverc file Please refer this link for more Create Table properties https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_create_table.html
... View more
03-07-2017
08:42 PM
You may not have appropriate Jar in your class path thats the reaon it is throwing java.lang.NoClassDefFoundError i belive you are missing the in the httpclient-4.2.jar in your Java application classpath. When you extra the jar you could see the below class. org.apache.http.client.utils.URIUtils.class
... View more
03-02-2017
08:28 PM
I belive the problem might be in this configuration file . did you change the localhost into your "hostname" - in Server_host in the below configuration. /etc/cloudera-scm-agent/config.ini server_host=localhost
change it to
server_host= - to the host were you installed CM then sudo service cloudera-scm-server-db start
$ sudo service cloudera-scm-server start this should help you to connect to CM via browser
... View more
03-01-2017
09:46 PM
in mapred-site.xml mapreduce.map.memory.mb =
mapreduce.task.io.sort.mb =
... View more
02-24-2017
05:21 AM
Use the event desearlizer You can use BlobDeserializer - if you want to parse the whole file inside one event. or You can use Line - one event per line of text input. Refer the link https://flume.apache.org/FlumeUserGuide.html#event-deserializers
... View more