About jpp

jpp · ‎12-16-2016

Hi Cristian, the amount of memory that yarn can allocate is controlled by the setting "Memory allocated for all YARN containers on a node" under YARN. Set this to 3 GB, and that should give enough room for Tez to run (it needs 2.5 GB if you follow the settings above.).

jpp · ‎12-06-2016

Since you are in a sandbox, you need to reduce the amount of memory taken by each component so that they can fit (try running with 12 GB if you can). Reduce the memory footprint as follows: Tez container size = 1024 MB Map join, per Map memory = 256 MB Metastore heap = 512 MB Client heap = 512 MB Tez Client->tez.am.resource.memory.mb = 512 YARN will need to fit at least one Tez AM (512 MB) and a couple Tez containers (512MB *2). You can check how much memory is allocated to YARN on the YARN config page "Memory allocated for all YARN containers on a node".

jpp · ‎12-06-2016

@Pooja Sahu In the source file, the original '\001' character code has been replaced with the string representation "^A". One way to process the file is to convert it back to \001: CREATE EXTERNAL TABLE fix_raw (line string) ROW FORMAT DELIMITED LOCATION '/user/pooja/fix/'; CREATE TABLE fix_map (tag MAP<STRING, STRING>) STORED AS ORC; INSERT INTO TABLE fix_map SELECT str_to_map( replace(line, '^A', '\001'), '\001', '=') tag from fix_raw; -- query SELECT tag[49] FROM fix_map;

jpp · ‎12-02-2016

@Pooja Sahu Can you share a couple rows of the data set ?

jpp · ‎12-02-2016

The Tez job has not started: both the mapper and the reducer are in "pending" state and haven't yet been launched. Once launched they would enter "running" state. Check yarn to ensure there is enough room in your queue to fit the containers (http://sandbox:8088). There isn't much ram in a sandbox, and it could all be taken up by a Spark instance or by the Tez instances of hiveserver2.

jpp · ‎08-16-2016

In the second scenario, is it possible to copy the raw encrypted files from the first to the second cluster ?

jpp · ‎07-22-2016

The Hadoop Group can be changed with the following command: /var/lib/ambari-server/resources/scripts/configs.sh \ -u admin -p admin set localhost cluster_name cluster-env user_group new_group This assumes it is running from the Ambari host with the default credentials. Replace cluster_name with the name of the cluster. Details here: https://cwiki.apache.org/confluence/display/AMBARI/Update+Service-Accounts+After+Install

jpp · ‎07-21-2016

We can set the "Hadoop Group" during installation by using the customize services => Misc tab. How to change it post-install ?

jpp · ‎05-17-2016

Can you point me to instructions on how to build a cloudbreak virtual machine image from a custom base image ? This would be for an openstack deployment.

jpp · ‎04-26-2016

The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.

Online	Offline
Last Visited	‎09-19-2024 01:15 PM

Member Since	‎09-21-2015 05:16 PM
Last Visited	‎09-19-2024 01:15 PM
Posts	28
Kudos received	29

Cloudera Community

Re: Hive's taking to much time. It is normal?

Re: Change hadoop service group post-install

Re: Error while creating a table with 'LIKE' claus...

Re: Hive table format and compression

Re: What is the best way to implement row-based se...

Re: Hive's taking to much time. It is normal?

Re: Hive's taking to much time. It is normal?

Re: HIVE : Error while creating table

Re: HIVE : Error while creating table

Re: Hive's taking to much time. It is normal?

Re: Sharing Encryption Keys between clusters (repl...

Re: Change hadoop service group post-install

Change hadoop service group post-install

Cloudbreak custom image

Re: Hive table format and compression