About MindGlass

MindGlass · ‎04-18-2017

Then, if I turn on ACID Transactions option, do I need to install Hive Standalone Metastore additionally?

MindGlass · ‎03-25-2017

Here's hadoop cluster information in CentOS 6.7. /etc/hosts 10.10.1.10 cm.hdp.com cm 10.10.1.11 nn01.hdp.com nn01 10.10.1.12 nn02.hdp.com nn02 10.10.1.13 yarn.hdp.com yarn 10.10.1.14 dn01.hdp.com dn01 10.10.1.15 dn02.hdp.com dn02 10.10.1.16 dn03.hdp.com dn03 In case of a node, nn01.hdp.com, I wondering about HOSTNAME's value in '/etc/sysconfig/network'. Following this link, it is recommend HOSTNAME pattern like that "FQDN". https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/edit_the_network_configuration_file.html Why is that HOSTNAME changed to FQDN? I think, HOSTNAME value is not FQDN but HOSTNAME like this. HOSTNAME=nn01

MindGlass · ‎11-21-2016

--split-by option is possible for text column after add sqoop-site.xml in ambari or add that option in command line. Like this. I think oracle record count is not relevant splitted file size. Because actual file size depends on column count and column type and column value size per one record. And here's my interesting sqoop import results. One file total size : 2.2 GB sqoop import ... --direct --fetch size 1000 --num-mappers 10 --split-by EMP_NO (TEXT) 0 bytes each 3 mappers, and 1.1GB to 1 mapper. and re-test with same value except below option. --split-by REEDER_ID (NUMBER) In my opinion, Sqoop mappers only parallel processing without regard to the file size for selected query results in oracle record, these are not evenly split file size. Also --split-by with NUMBER TYPE COLUMN option is useful than TEXT TYPE COLUMN that is not accurate for splitted file size.

MindGlass · ‎11-16-2016

That param "--direct-split-size" is only for postsgreSQL, If I'm right. Because I already tested it, and It's not working. See my result. Total table size is 2.3GB. sqoop import ... --num-mappers 4 --direct-split-size 600000000

MindGlass · ‎11-15-2016

Sqoop Version 1.4.6 in HDP 2.5.0.0 Oracle 11g Select query size is about 2.3GB. Sqoop Import .... --num-mappers 4 --split-by STR ... Result. I think the mappers options doesn't disrelated with saving same file size. I want to split output file size each per 570MB, but sqoop parameter is not support that feature. Is that another options or tips for output file size?

MindGlass · ‎11-09-2016

I think Smart Sense feature is not free for general user in case of using all Smart Sense menu service in Ambari. Am I right?

MindGlass · ‎11-09-2016

Where is the the tools or smartsense tab in support portal? I can't see it.

MindGlass · ‎10-04-2016

Thanks Ashnee. I didn't notice that.

MindGlass · ‎09-30-2016

Here's my ambari-server web. It's consist of two versions stack. (2.4.2.0 | 2.5.0.0) I tried that upgrade HDP-2.4.2.0 to HDP-2.5.0.0, but failed some reasons. And then restarted ambari-server, I got this weird issue. Of course registered correct stack version info before upgrade. I feel the lack of delete feature for registered stack version in ambari-server wb. How to delete registered stack version? Like curl REST api...Delete ambari DB..... I hope newest ambari-server version is support delete stack version in ambari-server web.

MindGlass · ‎07-15-2016

System : HDP 2.4.2.0, AMBARI, 2.2.2.0 Machine : 5 Datanode Servers in 10 Servers, Datanode Server Disk Volume Info : 3TB x 10 Datanode Directory in Ambari web : DataNode directories (Just only one Disk1 (/data1) consist of two datanode directory per Datanode Server. Disk Volume1 : /data1/hadoop/hdfs/data, /data1/crash/hadoop/hdfs/data -> I don't know why it's added crash directory in only Disk1 Volume (/data1) Disk Volume2 : /data1/hadoop/hdfs/data .... Disk Volume5 : /data1/hadoop/hdfs/data /data1/crash/hadoop/hdfs/data Here's my question. Q1. When I put large size data in HDFS's certain directory (/dataset), I'm wondering hdfs replication policy each per server's disks. ex. dfs replication count = 3, certain 10GB File in HDFS's /dataset That File is separate replication architecture in disk volume per datanodes. Case1. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode8 - disk2 | Datanode3 - disk6 ...... Case2. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode2 - disk1 | Datanode7 - disk1 ...... | Datanode$ - disk1 Which one is correct hadoop hdfs replication & distribution store data manage policy. Is that possible Case2? Q2. How to move safely one datanode directory (/data1/crash/hadoop/hdfs/data) to another Disk(2~5) Volumes in same datanode directory or another datanode directory disk(2~5). Because It's bring the disk full issue only disk1 (data1) faster than another disks. In case Disk1's /data directory, up to double the block data & meta files is stored. So I need to know the solution of datanode directory's (/data1/crash/hadoop/hdfs/data) data before remove this path /data1/crash/hadoop/hdfs/data in Amabri - HDFS - Configs - Settings -Datanode directories.

Online	Offline
Last Visited	‎06-25-2016 02:41 AM

Member Since	‎06-24-2016 10:23 AM
Last Visited	‎06-25-2016 02:41 AM
Posts	111
Kudos received	8

Cloudera Community

Re: Enable / disable ACID

What is the most idle way for a hadoop cluster's F...

Re: How to set equivalent output file size after s...

Re: How to set equivalent output file size after s...

How to set equivalent output file size after sqoop...

Re: How to create Smartsense id, custoemer id whil...

Re: How to create Smartsense id, custoemer id whil...

Re: How to delete ambari-server's stack version in...

How to delete ambari-server's stack version in clu...

HDFS Datanode Replication Policy on multiple disk.