Member since
09-21-2015
28
Posts
40
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
779 | 12-02-2016 05:54 PM | |
261 | 07-22-2016 04:12 PM | |
340 | 04-22-2016 04:28 PM | |
14273 | 04-22-2016 07:58 AM | |
2179 | 10-08-2015 10:32 PM |
12-16-2016
12:56 AM
Hi Cristian, the amount of memory that yarn can allocate is controlled by the setting "Memory allocated for all YARN containers on a node" under YARN. Set this to 3 GB, and that should give enough room for Tez to run (it needs 2.5 GB if you follow the settings above.).
... View more
12-06-2016
04:55 PM
Since you are in a sandbox, you need to reduce the amount of memory taken by each component so that they can fit (try running with 12 GB if you can). Reduce the memory footprint as follows: Tez container size = 1024 MB Map join, per Map memory = 256 MB Metastore heap = 512 MB Client heap = 512 MB Tez Client->tez.am.resource.memory.mb = 512 YARN will need to fit at least one Tez AM (512 MB) and a couple Tez containers (512MB *2). You can check how much memory is allocated to YARN on the YARN config page "Memory allocated for all YARN containers on a node".
... View more
12-06-2016
04:41 PM
@Pooja Sahu In the source file, the original '\001' character code has been replaced with the string representation "^A". One way to process the file is to convert it back to \001: CREATE EXTERNAL TABLE fix_raw (line string)
ROW FORMAT DELIMITED
LOCATION '/user/pooja/fix/';
CREATE TABLE fix_map (tag MAP<STRING, STRING>)
STORED AS ORC;
INSERT INTO TABLE fix_map
SELECT str_to_map( replace(line, '^A', '\001'), '\001', '=') tag from fix_raw;
-- query
SELECT tag[49] FROM fix_map;
... View more
12-02-2016
05:54 PM
2 Kudos
The Tez job has not started: both the mapper and the reducer are in "pending" state and haven't yet been launched. Once launched they would enter "running" state. Check yarn to ensure there is enough room in your queue to fit the containers (http://sandbox:8088). There isn't much ram in a sandbox, and it could all be taken up by a Spark instance or by the Tez instances of hiveserver2.
... View more
10-29-2016
01:06 AM
4 Kudos
LOCATION is never mandatory, and can be used with any combination of managed, external and partitioned tables. The following statements are all valid: create database if not exists test;
use test;
-- no LOCATION
create table t1 (i int);
create EXTERNAL table t2(i int);
create table t3(i int) PARTITIONED by (b int);
create EXTERNAL table t4(i int) PARTITIONED by (b int);
-- with LOCATION
create table t5 (i int) LOCATION '/tmp/tables/t5';
create EXTERNAL table t6(i int) LOCATION '/tmp/tables/t6';
create table t7(i int) partitioned by (b int) LOCATION '/tmp/tables/t7';
create EXTERNAL table t8(i int) partitioned by (b int) LOCATION '/tmp/tables/t8';
show tables;
drop table t1; drop table t2; drop table t3; drop table t4;
drop table t5; drop table t6; drop table t7; drop table t8;
If LOCATION is not specified, hive will use the value of hive.metastore.warehouse.dir in all cases. With the above example: hdfs dfs -ls /apps/hive/warehouse/test.db
Found 4 items
drwxrwxrwx - hive hdfs 0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t1
drwxrwxrwx - hive hdfs 0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t2
drwxrwxrwx - hive hdfs 0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t3
drwxrwxrwx - hive hdfs 0 2016-10-29 00:54 /apps/hive/warehouse/test.db/t4
Note how t2 and t4 were both external tables. hdfs dfs -ls /tmp/tables
Found 4 items
drwxrwxrwx - hive hdfs 0 2016-10-29 01:00 /tmp/tables/t5
drwxrwxrwx - hive hdfs 0 2016-10-29 01:00 /tmp/tables/t6
drwxrwxrwx - hive hdfs 0 2016-10-29 01:00 /tmp/tables/t7
drwxrwxrwx - hive hdfs 0 2016-10-29 01:00 /tmp/tables/t8
... View more
08-16-2016
06:56 PM
In the second scenario, is it possible to copy the raw encrypted files from the first to the second cluster ?
... View more
07-22-2016
04:12 PM
The Hadoop Group can be changed with the following command: /var/lib/ambari-server/resources/scripts/configs.sh \
-u admin -p admin set localhost cluster_name cluster-env user_group new_group This assumes it is running from the Ambari host with the default credentials. Replace cluster_name with the name of the cluster. Details here: https://cwiki.apache.org/confluence/display/AMBARI/Update+Service-Accounts+After+Install
... View more
07-21-2016
01:36 PM
We can set the "Hadoop Group" during installation by using the customize services => Misc tab. How to change it post-install ?
... View more
Labels:
05-17-2016
11:22 PM
2 Kudos
Can you point me to instructions on how to build a cloudbreak virtual machine image from a custom base image ? This would be for an openstack deployment.
... View more
Labels:
04-26-2016
10:56 PM
The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.
... View more
04-22-2016
04:28 PM
1 Kudo
You will need to specify the column you are clustering on, and then achieve it in multiple statements: CREATE TABLE emp1 LIKE emp;
ALTER TABLE emp1 SET FILEFORMAT ORC;
ALTER TABLE emp1 CLUSTERED BY (empId) INTO 4 BUCKETS;
... View more
04-22-2016
07:58 AM
2 Kudos
If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE. CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';
Once the data is mapped, you can convert it to other formats like parquet: set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;
For the hive optimized ORC format, the syntax is slightly different: CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;
... View more
12-16-2015
06:35 PM
The COUNT(DISTINCT) could be the bottleneck if it is not being parallelized. Can you share the explain plan ?
... View more
12-10-2015
05:43 PM
The work to generically create a table by reading a schema from orc, parquet and avro is tracked in HIVE-10593.
... View more
11-20-2015
10:03 PM
A mistyped hadoop fs -rmr -skipTrash can have catastrophic consequences, which can be protected against with snapshots. What are the performance concerns ?
... View more
11-20-2015
04:41 PM
9 Kudos
Repo Description Automatically create, rotate, and destroy periodic HDFS snapshots. This is
the utility that creates the @hdfs-auto-snap_frequent, @hdfs-auto-snap_hourly,
@hdfs-auto-snap_daily, @hdfs-auto-snap_weekly, and @hdfs-auto-snap_monthly
snapshots if it is installed. Repo Info Github Repo URL https://github.com/jpplayer/hdfs-auto-snapshot Github account name jpplayer Repo name hdfs-auto-snapshot
... View more
- Find more articles tagged with:
- Cloud & Operations
- HDFS
- operations
- snapshot
- utilities
Labels:
11-17-2015
12:02 AM
1 Kudo
What size Nifi system would we need to read 400 MB/s from a Kafka topic and store the output in HDFS ? The input is log lines, 100 B to 1KB in length each.
... View more
Labels:
10-30-2015
11:26 PM
2 Kudos
Pig does not support appending to an existing partition through HCatalog. What workarounds are there to perform the append and get a behavior similar to Hive's INSERT INTO TABLE with Pig ?
... View more
Labels:
10-27-2015
08:58 PM
You would assign one folder to each of the datanode disks, closely mapping dfs.datanode.data.dir. On a 12 disk system you would have 12 yarn local-dir locations.
... View more
10-22-2015
08:34 PM
1 Kudo
In an HA environment, you should always refer to the nameservice, not any one of the namenodes. The syntax for the URL is hdfs://<nameservice>/ Notice that no port number is specified. The HA configuration should be defined in /etc/hadoop/conf/core-site.xml and accessible by the process. WebHDFS does not natively support Namenode HA but you can use Knox to provide the functionality.
... View more
10-10-2015
12:12 AM
1 Kudo
That was it. Thanks ! Used http://tweeterid.com/ to convert from username to user id.
... View more
10-10-2015
12:06 AM
2 Kudos
Tried both using the Twitter username and @username notation but Nifi errors out with "invalid because Must be comma separated list of user IDs"
... View more
Labels:
10-08-2015
10:32 PM
6 Kudos
Row-level security can be achieved by defining views with hard-coded permissions in Ranger. An alternative available since Hive 1.2.0 is to filter dynamically based on the current user, with the current_user() function. This provides row-by-row security. One option to define the ACLs is via a permission table: create table permission( username string, driverid string); For example to secure the driver(driverid, drivername) table, you could create the following permission: insert into permission values( jsmith, 25 ); Finally define the view by joining against it: create view secure_driver AS select d.* from driver d inner join permissions p on d.driverid=p.driverid where username = current_user();
... View more
10-05-2015
05:38 PM
The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split MapReduce uses the following: set mapreduce.input.fileinputformat.split.minsize=16777216; -- 16 MB
set mapreduce.input.fileinputformat.split.minsize=1073741824; -- 1 GB Increase min and max split size to reduce the number of mappers.
... View more
10-05-2015
05:13 PM
5 Kudos
Starting with Hive 0.14, there is a standalone jar that contains most of the necessary binaries. It still currently requires two additional jars until HIVE-9600 is resolved: hive-jdbc-<version>-standalone.jar hadoop-common.jar hadoop-auth.jar See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/hive-jdbc-odbc-drivers.html for details.
... View more