Member since
05-11-2016
42
Posts
2
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1094 | 02-07-2018 06:22 AM | |
926 | 11-13-2017 08:04 AM | |
943 | 07-20-2017 03:01 AM |
01-31-2020
10:06 PM
https://issues.apache.org/jira/secure/attachment/12805509/HIVE-13029.4.patch I found the comment of source code. + private ByteBuffer preallocate(int arenaSize) {
+ if (isMapped) {
+ Preconditions.checkArgument(isDirect, "All memory mapped allocations have to be direct buffers");
+ try {
+ File rf = File.createTempFile("arena-", ".cache", cacheDir.toFile());
+ RandomAccessFile rwf = new RandomAccessFile(rf, "rw");
+ rwf.setLength(arenaSize); // truncate (TODO: posix_fallocate?)
+ ByteBuffer rwbuf = rwf.getChannel().map(MapMode.PRIVATE, 0, arenaSize);
+ // A mapping, once established, is not dependent upon the file channel that was used to
+ // create it. delete file and hold onto the map
+ rwf.close();
+ rf.delete();
+ return rwbuf;
+ } catch (IOException ioe) {
+ LlapIoImpl.LOG.warn("Failed trying to allocate memory mapped arena", ioe);
+ // fail similarly when memory allocations fail
+ throw new OutOfMemoryError("Failed trying to allocate memory mapped arena: " + ioe.getMessage());
+ }
+ }
+ return isDirect ? ByteBuffer.allocateDirect(arenaSize) : ByteBuffer.allocate(arenaSize);
+ } So, llap daemon creates tmp file and delete the file immediately, but keep using it. This should be the reason of difference of output from df command and du command. a little bit trickey, isnt it?
... View more
01-31-2020
05:04 AM
After investigation and testing, I found that actually Hive LLAP daemon seems to use the SSD device with the OS path (in my case " OS path "/hadoop/hive/llap") even when there is no OS file under the directory. If you are using Linux, you can see difference of usage from df command and usage from du command. In my case, the difference of them was matched to the cache usage from LLAP daemon UI. Also, you can confirm there is no evicted with "http://XXXX:15002/iomem" like, ORC cache summary: 0 locked, 556529 unlocked, 0 evicted, 0 being moved,18236342272total used space I think, this is a kind of direct access to the block device (not via OS FS.)
... View more
12-09-2019
07:53 PM
I'm also having same issue. I'm using HDP3.1.0 and enable it with Ambari like, I did some trials with changing cache size for LLAP daemon, heap size for LLAP daemon, etc. but no luck at all. I can't see any OS files under the OS path "/hadoop/hive/llap". BTW, when I run Hive query with LLAP, I can see some usage of LLAP daemon cache after turning on "Turn SSD cache On" with no usage of the OS path "hadoop/hive/llap". Does it mean, LLAP uses OS memory as LLAP cache as well as SSD?
... View more
05-22-2019
05:54 PM
I had same issue, and we are using HDP 3.1.0.0-78 . https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_tez.html TEZ-3894 seems to be already applied to HDP 3.1. (Also, I've checked the source code a little, yes, it looks already applied.) But I still have this issue... I can avoid this issue by changing fs.permissions.umask-mode from "077" to "022" in a HS2 session. 0: jdbc:hive2://XXXX > set fs.permissions.umask-mode=022; So I guess, this issue may not be fixed completely with TEZ-3894 (with HDP 3.1.0.0-78)...
... View more
05-22-2019
04:27 AM
Looks, it's tez issue comes from "fs.permissions.umask-mode" setting. https://community.hortonworks.com/questions/246302/hive-tez-vertex-failed-error-during-reduce-phase-h.html
... View more
05-21-2019
03:44 AM
I'm having same issue with HDP3.1 (Tez 0.9.1). I can reproduce it with: 1) create two files - file1.csv and file2.csv
2) add two fields to the csv files as below
one,two
one,two
one,two
3) create external table
use testdb;
create external table test1(s1 string, s2 string) row format delimited fields terminated by ',' stored as textfile location '/user/usera/test1';
4) Copy one csv file to hdfs - /user/usera/test1
hdfs dfs -put ./file1.csv /user/usera/test1/
5) select count(*) from testdb.test1;
=> works fine.
6) copy the second csv file to HDFS
hdfs dfs -put ./file2.csv /user/usera/test1/
7) select * from testdb.test1;
=> Can see the data in both hdfs files.
8) select count(*) form testdb.test1;
=> Get this problem. And we can see following error in the mapper task's log. 2019-05-17 10:08:10,317 [INFO] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1]. len=25, decomp=11. ExceptionMessage=Not a valid ifile header 2019-05-17 10:08:10,317 [WARN] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1] from XXXXX java.io.IOException: Not a valid ifile header at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:859) at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:866) at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:616) at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121) at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:950) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) I think, it's similar to https://issues.apache.org/jira/browse/TEZ-3699 I've confirmed the patch already applied to tez with HDP 3.1. So I guess, it's new bug with Tez 0.9.x (I confirmed there is no problem with HDP2.6/Tez 0.7.0). Any idea?
... View more
04-25-2019
12:41 PM
One update. Hive seems to have an issue about handling "viewfs://"? For example, hive db creation got failed with permission error even if the permission of the target hdfs dir is 777. HDFS permission: hdfs dfs -ls -d viewfs://fed/user/hadoop/warehouse/hadoop_viewfs6.db drwxrwxrwx - hadoop hadoop 0 2019-04-24 07:44 viewfs://fed/user/hadoop/warehouse/hadoop_viewfs6.db Hive DB creation got failed with the hdfs dir. 0: jdbc:hive2://XXXX > create database hadoop_viewfs6 location '/user/hadoop/warehouse/hadoop_viewfs6.db'; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hadoop] does not have [WRITE] privilege on [viewfs://fed/user/hadoop/warehouse/hadoop_viewfs6.db] (state=42000,code=40000 Succeeded with "hdfs://" 0: jdbc:hive2://XXXX > create database hadoop_viewfs6 location 'hdfs://ns1/user/hadoop/warehouse/hadoop_viewfs6.db';
... View more
04-24-2019
07:01 AM
Now I'm trying to use viewfs for NameNode Federation with HDP3.1. I found "ViewFs is not supported on Hive clusters." on the following page. https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/managing-and-monitoring-ambari/content/amb_configure_viewfs.html Does it mean viewfs is not supported on Hive services? (I can't get the meaning of "Hive clusters".) Thank you for your any help 🙂
... View more
Labels:
- Labels:
-
Apache Hive
03-29-2019
09:32 AM
1 Kudo
I had same issue and solved by changing "yarn.nodemanager.resource.memory-mb" from 468GB to 200GB with Amabri (There were following 11 changes.) (I shared it because it's really hard to find out the reason by reading error messages...)
... View more
09-18-2018
08:36 AM
Hi, I'm facing completely same issue with HDP2.6.2. HDFS client have to wait about 20 seconds when the 1st NameNode is powered off. (Actually, we had this issue then the 1st NameNode had kernel hang (kernel panic).) Do you find good solution or workaround??? If so, please share it. Any information will help us !
... View more
06-01-2018
03:14 AM
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html This page also says "CORES (number of CPU cores)". So, "CORES" means "physical cores"? If it's true, 12 physical CPU cores (24 vcores by Intel HT) may be good enough for a node with 12 HDDs. I'm welcome any opinions from all of you 🙂
... View more
05-25-2018
05:21 AM
I know we have best practice for balance of # of cores and # of disks. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html # of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE) I believe, it means that slave nodes in "2*CORES = 1.8*DISKS" are best balanced one in point of CPUs and Disks. Does anyone know the "CORES" means whether number of "physical" cores or number of "virtual" cores (i.e. Hyper-Threading Technology)? If it means "physical" cores, number of physical CPU cores is nice to be 12 with 12 disks. If it means "virtual" cores by for example Intel HT, 6 physical cores would be enough with 12 disks (best balanced node). Also, I'm wandering, we should enable Hyper-Threading or not to get better "throughput". Any reply, comment and suggestion will help me. Thanks!
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-07-2018
06:22 AM
I don't think so.
... View more
02-07-2018
06:19 AM
I had same problem "ranger can sync users with ldap but can't login to ranger UI with ldap password." Finally I could solved this problem, so, let me share lessons learned and how I solved to help you guys who has same problem as I faced. lessons learned 1. We have to configure ranger admin to speak ldaps protocol if we want to use ldaps for user authentication. paramaters in ranger-admin-site: ranger.truststore.file, ranger.truststore.password I had to import self-signed CA from LDAP team to "/etc/ranger/admin/conf/ranger-admin-keystore.jks". Set password which I specified for this import to "ranger.truststore.password". Command example: keytool -importcert -alias rangeradmin -noprompt -trustcacerts -file ./ca.crt -keystore /etc/ranger/admin/conf/ranger-admin-keystore.jks -storepass xasecure ref: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_security/content/configure_non_ambari_ranger_ssl_self_signed_cert_admin.html 2. setting syncing user info with ldap and setting to use ldap for authentication are technically different. For example, we can use ldap authentication for ranger UI login even when we disable "Enable User Sync". In other words, we can use ldap authentication when "Ranger Usersync" service is not running. 3. debug logs from "org.springframework" and "org.apache.ranger" were very useful for the trouble shooting. We can change the log level with "admin-log4j.xml". log4j.category.org.springframework=debug,xa_log_appender log4j.category.org.apache.ranger=debug,xa_log_appender 4. Here are the key configurations for ldap authentication (not for user sync with ldap.) Authentication method: LDAP LDAP URL: ldaps://xxxxxx User Search Filter: (uid={0}) Group Search Filter: (member=uid={0},ou=xxxxx,o=xxxxx) ranger.ldap.user.dnpattern: uid={0},ou=xxxxx,o=xxxxx ranger.truststore.file: /etc/ranger/admin/conf/ranger-admin-keystore.jks <= in case with ldaps. ranger.truststore.password: xasecure <= in case with ldaps. this is the passwoed you set when you import ca to jks. I hope, this memo help guys who have same problem as I faced 🙂
... View more
11-13-2017
08:04 AM
After upgrading our cluster from HDP 2.6.1 to HDP 2.6.2, the following WARN message is not output in NameNode's log. WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent number of corrupt replicas for blk_xxxx_xxxx blockMap has 0 but corrupt replicas map has 1
So, the problem seems to be fixed by this upgrading in our clusters.
... View more
11-13-2017
02:05 AM
Hi, can anyone reply?
... View more
11-08-2017
04:54 AM
Sorry, I needed to say, "I also think, if we want to control security between clients and hadoop clusters only with Knox (i.e. use Knox as the security "proxy" between clients and hadoop clusters), we have to eliminate hadoop clients on Edge Node, because Knox is only for O/JDBC hadoop client."
... View more
11-08-2017
04:51 AM
Sorry for my stupid question. Can we use Knox for legacy hadoop clients (Edge Node/Haddoop CLIs) with RPC? As the page (http://pivotalhd.docs.pivotal.io/docs/knox-gateway-administration-guide.html) explains, I think, we can not use Knox for legacy hadoop clients... I also think, if we want to control security between clients and hadoop clusters (i.e. use Knox as the security "proxy" between clients and hadoop clusters), we have to eliminate hadoop clients on Edge Node, because Knox is only for O/JDBC hadoop client. Are my understandings right?
... View more
Labels:
- Labels:
-
Apache Knox
10-20-2017
05:23 PM
We have completely same problem as https://issues.apache.org/jira/browse/HDFS-11797 with HDP 2.6.1. https://issues.apache.org/jira/browse/HDFS-11797?focusedCommentId=16039577&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16039577 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Inconsistent number of corrupt replicas for blk_123456789_123456 blockMap has 0 but corrupt replicas map has 1
org.apache.hadoop.ipc.Server: IPC Server handler 34 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from xxx.xxx.xxx.xxx:xxxxx Call#91 Retry#0 java.lang.ArrayIndexOutOfBoundsException Actually, our hive client fails for this problem. and hdfs fsck command also fails for this hdfs file with this problem. I read a series of JIRA tickets. https://issues.apache.org/jira/browse/HDFS-9958 https://issues.apache.org/jira/browse/HDFS-10788 https://issues.apache.org/jira/browse/HDFS-11797 https://issues.apache.org/jira/browse/HDFS-11445 https://issues.apache.org/jira/browse/HDFS-11755 At the second last comment of HDFS-11755 https://issues.apache.org/jira/browse/HDFS-11755?focusedCommentId=16200946&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200946 As discussed in HDFS-11445, a regression caused by HDFS-11445 is fixed by HDFS-11755. I'd like to backport HDFS-11755 into branch-2.7 as a result. and, https://issues.apache.org/jira/browse/HDFS-11755?focusedCommentId=16201164&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16201164 Filed HDFS-12641 to initiate the discussion. and https://issues.apache.org/jira/browse/HDFS-12641 is not resolved. I'm not sure, but HDFS-12641 may be only for CDH??? I've also checked, HDFS-11445 is not included in HDP 2.6.1. but, it's included in HDP 2.6.2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_release-notes/content/patch_hadoop.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_release-notes/content/patch_hadoop.html So, can someone confirm that our current problem with message "blockMap has 0 but corrupt replicas map has 1" is safely fixed with HDP 2.6.2 with HDFS-11445? Actually, we plan to upgrade from HDP 2.6.1 to HDP 2.6.2. but I'm worry about that upgrading to HDP 2.6.2 would make new problem such as HDFS-11755 says "a regression caused by HDFS-11445 is fixed by HDFS-11755."... I've confirmed HDFS-11755 is not included in HDP 2.6.1, HDP 2.6.2.
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
10-20-2017
10:31 AM
Jay SenSharma Thank you for replying. Our NameNode manages over 300 Millions hdfs files and dirs. So, the table above says, we need more than "104473m" heap size for "150-200 Millions". And If number of hdfs files and dirs become 600 Millions, we would need about 300 GB (100GB x 3) memory for NameNode. > The JVM should not complain if the heap size is set to a larger value (until we have enough RAM available) So, you mean, we can set 300GB for java heap size, and the NameNode or the JVM for the NameNode should not complain if the node has enough physical memory, for example, 500 GB, right?
... View more
10-20-2017
10:19 AM
Joseph Niemiec , Alberto Ramon thank you for your replying. About full GC, we use CMS GC for these NameNodes, it takes about 2 mins in total, but the STW GC ("Initial Mark" and "Final Remark") is less than a few second in total. So, we don't worry about the GC issue for now 🙂 Yes, we are trying to use HDFS federation with nn3 and nn4 for second nameservice now to reduce heap usage of the first set of namenodes (nn1 and nn2) for the first nameservice. (Support HDFS Federation, please...)
... View more
10-18-2017
07:42 AM
Current our NameNodes has about 160GB jvm heap (Our NameNodes work with "-Xmx160g -Xms160", actual metadata size is about 120GB.). Can we increase it to about for example 300GB or 500GB?
I'm worry about that sometime NameNode or Java might say like "I can not handle this much memory, sorry and good bye..."
Thanks!
... View more
- Tags:
- Hadoop Core
- namenode
Labels:
- Labels:
-
Apache Hadoop
09-08-2017
04:13 AM
Sorry, There was copy and paste miss in question above. Now we use "-XX:CMSInitiatingOccupancyFraction=90" for current NameNode with CMS.
... View more
09-07-2017
12:17 PM
Now I are trying to change GC from CMS to G1GC. And let's say, current situation of NameNode with CMS are Physical memory size : 140 GB
-Xmx100G -Xms100G
current actual heap usage : 70 - 80 GB (So, usage is around 80%.)
-XX:InitiatingHeapOccupancyPercent : 90
The default value of "-XX:InitiatingHeapOccupancyPercent" is 45.
If I set 45% for "-XX:InitiatingHeapOccupancyPercent" for this NameNode, I think, current heap usage always hits the threshold... Could you advice how I should tune "-XX:InitiatingHeapOccupancyPercent" for this NameNode?
... View more
- Tags:
- Hadoop Core
- namenode
Labels:
- Labels:
-
Apache Hadoop
07-20-2017
03:01 AM
I think, no response means you guys do not recommend this my use case. I decided to follow install guide. Thanks!
... View more
07-18-2017
07:29 AM
Can you someone answer to my question? If my question is not clear, please let me know 🙂
... View more
07-13-2017
11:31 AM
Yes, it relates to my question. I'm asking about "3. On each node (specified by their fully qualified domain names), create the host and headless principals, and a keytab with each:" I think, this part says, we need to create keytab file for each nodes (for all nodes with NodeManager) and put it in "OS local directory (/etc/security/keytabs)" on each node to launch LLAP daemons. Of course, I can follow this procedure, but if possible, I want to avoid putting the keytab files on OS local directory for our administration reason. As you may know, when we launch HBase with Slider on Yarn, we can put required keytab files to launch HBase components such as HBase Master, RegionServers on hdfs instead of putting keytab files on OS local directory. In this case, we don't need to put the keytab files on OS local directory on each node. Instead, we just need to put keytab file with principals for all nodes on hdfs and configure appConfig.json to make Hbase components use the keytab file on hdfs. So, I'm asking whether we can do the same to launch LLAP daemons or not.
... View more
07-13-2017
05:25 AM
I know in case we launch hbase cluster with slider on yarn, we can put keytab files on hdfs to launch hbase components by adding followings to appConfig.json instead of putting keytab files on local directory /etc/security/keytabs. "site.hbase-site.hbase.regionserver.kerberos.principal": "${USER_NAME}/_HOST@EXAMPLE", "site.hbase-site.hbase.regionserver.keytab.file": "${AGENT_WORK_ROOT}/keytabs/${USER_NAME}.service.keytab",
"site.hbase-site.hbase.master.kerberos.principal": "${USER_NAME}/_HOST@EXAMPLE", "site.hbase-site.hbase.master.keytab.file": "${AGENT_WORK_ROOT}/keytabs/${USER_NAME}.service.keytab", Can we do same thing for launching LLAP daemons with Slider?
... View more
- Tags:
- llap
Labels:
- Labels:
-
Apache HBase
-
Apache YARN
-
HDFS
-
Kerberos
-
Security
07-11-2017
01:46 AM
@Rajkumar Singh thank you so much for your help!
... View more
07-10-2017
09:38 AM
Thank you for quick and clear answer. I understood we have to enable Ranger for LLAP. BTW, can we enable Ranger only for LLAP (HiveServer2) for the first step? I'm asking it because it's a little hard to add Ranger (plugins) for already existing hadoop core components such as HDFS (NameNode/DataNodes), Yarn (ResourceManager/NodeManagers). We plan to build a new server to launch LLAP (Hive2 HiveServer2 & LLAP with Slider & new MetaStore DB), so if we can enable Ranger only for new LLAP for now, it would be really easier for us than enabling Ranger for all existing hadoop components.
... View more