About rvillanueva

rvillanueva · ‎05-13-2020

Running a HortonWork hadoop cluster (HDP-3.1.0.0) and getting a bunch of Failed on local exception: java.io.IOException: Too many open files errors when running spark jobs that up until this point have worked fine. I have seen many other questions like this where the answer is to increase the ulimit settings for open files and processes (this is also in the HDP docs) (and I'll note that I believe that mine are still at the system default settings), but... My question is: Why is this only happening now when previously the spark jobs have been running fine for months? The spark jobs I have been running have been running fine for months without incident and I have made no recent code changes. Don't know enough about the internals of spark to theorize about why things could be going wrong only now (would be odd to me if open files just build up in the course of running spark, but that seems like what is happening). Just as an example, just this code... . . .sparkSession = SparkSession.builder.appName("GET_TABLE_COUNT").getOrCreate()sparkSession._jsc.sc().getExecutorMemoryStatus().keySet().size() . . . now generates errors like... . . . [2020-05-12 19:04:45,810] {bash_operator.py:128} INFO - 20/05/12 19:04:45 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:46,813] {bash_operator.py:128} INFO - 20/05/12 19:04:46 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:47,816] {bash_operator.py:128} INFO - 20/05/12 19:04:47 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:48,818] {bash_operator.py:128} INFO - 20/05/12 19:04:48 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:49,820] {bash_operator.py:128} INFO - 20/05/12 19:04:49 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:50,822] {bash_operator.py:128} INFO - 20/05/12 19:04:50 INFO Client: Application report for application_1579648183118_19918 (state: ACCEPTED) [2020-05-12 19:04:51,828] {bash_operator.py:128} INFO - 20/05/12 19:04:51 INFO Client: Application report for application_1579648183118_19918 (state: FAILED) [2020-05-12 19:04:51,829] {bash_operator.py:128} INFO - 20/05/12 19:04:51 INFO Client: [2020-05-12 19:04:51,829] {bash_operator.py:128} INFO - client token: N/A[2020-05-12 19:04:51,829] {bash_operator.py:128} INFO - diagnostics: Application application_1579648183118_19918 failed 2 times due to Error launching appattempt_1579648183118_19918_000002. Got exception: java.io.IOException: DestHost:destPort hw005.co.local:45454 , LocalHost:localPort hw001.co.local/172.18.4.46:0. Failed on local exception: java.io.IOException: Too many open files [2020-05-12 19:04:51,829] {bash_operator.py:128} INFO - at sun.reflect.GeneratedConstructorAccessor808.newInstance(Unknown Source) [2020-05-12 19:04:51,829] {bash_operator.py:128} INFO - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) My RAM and ulimit setting on the cluster look like... [root@HW001]# clush -ab free -h---------------HW001--------------- total used free shared buff/cache availableMem: 31G 9.0G 1.1G 1.7G 21G 19G Swap: 8.5G 44K 8.5G ---------------HW002--------------- total used free shared buff/cache availableMem: 31G 7.3G 5.6G 568M 18G 22G Swap: 8.5G 308K 8.5G ---------------HW003--------------- total used free shared buff/cache availableMem: 31G 6.1G 4.0G 120M 21G 24G Swap: 8.5G 200K 8.5G ---------------HW004--------------- total used free shared buff/cache availableMem: 31G 2.9G 2.8G 120M 25G 27G Swap: 8.5G 28K 8.5G ---------------HW005--------------- total used free shared buff/cache availableMem: 31G 2.9G 4.6G 120M 23G 27G Swap: 8.5G 20K 8.5G ---------------airflowetl--------------- total used free shared buff/cache availableMem: 46G 5.3G 13G 2.4G 28G 38G Swap: 8.5G 124K 8.5G [root@HW001]# [root@HW001]# [root@HW001]# [root@HW001]# clush -ab ulimit -a ---------------HW[001-005] (5)--------------- core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127886 max locked memory (kbytes, -l) 64max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 127886 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ---------------airflowetl--------------- core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 192394 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 192394 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Don't know much about Hadoop admin, but just looking at the Ambari dashboard, the cluster does not seem to be overly taxed... (though could not actually check the RM web UI, since it just throws a "too many open files" error). Anyone with more spark/hadoop experience know why this would be happening now?

rvillanueva · ‎12-23-2019

[The following question was moved here after originally being posted 12-23-2019 to this thread which was marked 'Solved' 06-01-2017 12:49 AM —Moderator] @jsensharma Could you explain a little more about what exactly this hadoop.proxyuser.root.groups config is? Any docs describing it more?

rvillanueva · ‎12-13-2019

@jsensharma 1. Need to use python3 and would like to continue to do so in the future considering that python2 will stop being maintained in 2020 (I would think others would have a similar desire as well) and am currently adding the option export PYSPARK_PYTHON=/path/to/my/virtualenv/bin/python; spark-submit sparksubmit.test.py as a workaround (else, this may be helpful: https://stackoverflow.com/a/51508990/8236733 or using the --pyfiles option). 2. IDK where that path reference is coming from since "../venv/bin/activate" is just activating a virtualenv and "sparksubmit.test.py" code is just from os import environ import time import pprint import platform pp = pprint.PrettyPrinter(indent=4) sparkSession = SparkSession.builder.appName("TEST").getOrCreate() sparkSession._jsc.sc().setLogLevel("WARN") print(platform.python_version()) def testfunc(num: int) -> str: return "type annotations look ok" print(testfunc(1)) print("\n\nYou are using %d nodes in this session\n\n" % sparkSession._jsc.sc().getExecutorMemoryStatus().keySet().size()) pp.pprint(sparkSession.sparkContext._conf.getAll()) but that blank space in "/usr/hdp//hadoop/lib" is interesting to see, especially since I use export HADOOP_CONF_DIR=/etc/hadoop/conf for the HADOOP_CONF_DIR in the terminal when trying to run the command. Furthermore, looking at my (client node) FS, I don't even see that path... [airflow@airflowetl tests]$ ls -lha /usr/hdp/current/hadoop- hadoop-client/ hadoop-httpfs hadoop-hdfs-client/ hadoop-mapreduce-client/ hadoop-hdfs-datanode/ hadoop-mapreduce-historyserver/ hadoop-hdfs-journalnode/ hadoop-yarn-client/ hadoop-hdfs-namenode/ hadoop-yarn-nodemanager/ hadoop-hdfs-nfs3/ hadoop-yarn-registrydns/ hadoop-hdfs-portmap/ hadoop-yarn-resourcemanager/ hadoop-hdfs-secondarynamenode/ hadoop-yarn-timelinereader/ hadoop-hdfs-zkfc/ hadoop-yarn-timelineserver/ [airflow@airflowetl tests]$ ls -lha /usr/hdp/current/hadoop ls: cannot access /usr/hdp/current/hadoop: No such file or directory (note I am using HDP v3.1.0)

rvillanueva · ‎12-12-2019

Is there a way to run spark-submit (spark v2.3.2 from HDP 3.1.0) while in a virtualenv? Have situation where have python file that uses python3 (and some specific libs) in a virtualenv (to isolate lib versions from rest of system). I would like to run this file with /bin/spark-submit, but attempting to do so I get... [me@myserver tests]$ source ../venv/bin/activate; /bin/spark-submit sparksubmit.test.py File "/bin/hdp-select", line 255 print "ERROR: Invalid package - " + name ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?ls: cannot access /usr/hdp//hadoop/lib: No such file or directoryException in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx at org.apache.spark.launcher.Main.main(Main.java:118) # also tried... (venv) [me@myserver tests]$ export HADOOP_CONF_DIR=/etc/hadoop/conf; spark-submit --master yarn --deploy-mode cluster sparksubmit.test.py 19/12/12 13:50:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable19/12/12 13:50:20 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) .... at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig Not sure what to make of this or how to proceed further and did not totally understand the error message after googling it. Anyone with more experience have any further debugging tips for this or fixes?

rvillanueva · ‎12-11-2019

From the Ranger email list, this is another bit of information that I found helpful: ---------- I’ve configured ranger using the following approach to control who must be synced with AD. Only users belonging to groups inside a specific OU will be synced. I’ve created the OU OU=ArthurAmericasGroups,OU=Security Groups,OU=Groups,OU=SHARED,OU=Brazil,DC=domain,DC=com Create a group called R2Users inside that OU. I put all desired sync users as its members. Also, you can put other groups as its member. And, you can create other groups like R2TEAM as well. Remember to update this property ranger.usersync.ldap.user.searchfilter to include more than one. I’ve configured ranger to sync groups before users. Here is the configuration. in COMMON CONFIGS Label Property Value LDAP/AD URL ranger.usersync.ldap.url ldap://myacticedirectoryserver.domain.com:389 Bind User ranger.usersync.ldap.binddn CN=LDAP_AD_ACCOUNT,OU=Service Accounts,OU=LCB,OU=Brazil,DC=domain,DC=com Bind User Password ranger.usersync.ldap.ldapbindpassword LDAP_AD_ACCOUNT user’s password Inclemetal Sync ranger.usersync.ldap.deltasync Yes Enable LDAP STARTTLS ranger.usersync.ldap.starttls No GROUP CONFIGS Label Property Value Enable Group Sync ranger.usersync.group.searchenable Yes Group Member Attribute ranger.usersync.group.memberattributename member Group Name Attribute ranger.usersync.group.nameattribute Cn Group Object Class ranger.usersync.group.objectclass Group Group Search Base ranger.usersync.group.searchbase OU=ArthurAmericasGroups,OU=Security Groups,OU=Groups,OU=SHARED,OU=Brazil,DC=domain,DC=com Group Search Filter ranger.usersync.group.searchfilter Enable Group Search First ranger.usersync.group.search.first.enabled Yes Sync Nested Groups is_nested_groupsync_enabled Yes Group Hierarchy Levels ranger.usersync.ldap.grouphierarchylevels 5 USER CONFIGS Label Property Value Username Attribute ranger.usersync.ldap.user.nameatributte sAMAccountName User Object Class ranger.usersync.ldap.objectclass User User Search Base ranger.usersync.ldap.searchbase DC=domain,DC=com User Search Filter ranger.usersync.ldap.user.searchfilter (memberOf=CN=R2Users,OU=ArthurAmericasGroups,OU=Security Groups,OU=Groups,OU=SHARED,OU=Brazil,DC=domain,DC=com) User Search Scope ranger.usersync.ldap.user.searchscope Sub User Group Name Attribute ranger.usersync.ldap.groupnameattribute sAMAccountName Group User Map Sync ranger.usersync.group.usermapsyncenabled Yes Enable User Search ranger.usersync.user.searchenabled Yes ADVANCED Ranger Settings Label Property Value Authentication method ACTIVE_DIRECTORY AD Settings Label Property Value AD Bind Password ranger.ldap.ad.bind.password LDAP_AD_ACCOUNT user’s password Domain Name (Only for AD) anger.ldap.ad.domain DC=domain,DC=com AD Base DN ranger.ldap.ad.base.dn DC=domain,DC=com AD Referreal ranger.ldap.ad.referreal Follow AD User Serach Filter ranger.ldap.ad.user.search (sAMAccountName={0}) Advanced ranger-ugsync-site Label Property Value ranger.usersync.ldap.referral ranger.usersync.ldap.referral Follow

rvillanueva · ‎12-09-2019

Saving this for later. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/ambari-authentication-ldap-ad/content/setting_up_ldap_user_authentication.html

rvillanueva · ‎12-09-2019

Appears to have been able to sync AD users after changing the bing user path to: CN=hwldap,OU=Users,OU=HortonworksUsers,DC=ucera,DC=local as opposed to using the "uid" entry key. IDK why this would make a difference, but seems to have worked. Would anyone with more AD experience have an idea why (note that when I look at the attributes for this entry in our AD both the CN and UID attributes are present)?

rvillanueva · ‎12-06-2019

Attempting to set LDAP/AD users for Ranger (v1.2.0) following the docs (https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/configuring-ranger-authe-with-unix-ldap-ad/content/configuring_ranger_authentication_with_unix_ldap_or_ad.html) and this older video (https://www.youtube.com/watch?v=2aZ9GBhCOhA), but when looking at the Ranger Users tab in the Ranger UI, seeing only the original Unix users. In the UI I see... Looking at the usersync logs, near the tail I see: .... 06 Dec 2019 14:21:51 INFO LdapUserGroupBuilder [UnixUserSyncThread] - LdapUserGroupBuilder initialization started 06 Dec 2019 14:21:51 INFO LdapUserGroupBuilder [UnixUserSyncThread] - LdapUserGroupBuilder initialization completed with -- ldapUrl: ldap://172.18.4.42:389, ldapBindDn: UID=hwldap,OU=Users,OU=HortonworksUsers,DC=ucera,DC=local, ldapBindPassword: ***** , ldapAuthenticationMechanism: simple, searchBase: dc=hadoop,dc=apache,dc=org, userSearchBase: [dc=ucera,dc=local], userSearchScope: 2, userObjectClass: user, userSearchFilter: (memberOf=UID=hwusers,OU=groups,OU=HortonworksUsers,DC=ucera,DC=local), extendedUserSearchFilter: (&(objectclass=user)(memberOf=UID=hwusers,OU=groups,OU=HortonworksUsers,DC=ucera,DC=local)), userNameAttribute: sAMAccountName, userSearchAttributes: [sAMAccountName, memberof], userGroupNameAttributeSet: [memberof], pagedResultsEnabled: true, pagedResultsSize: 500, groupSearchEnabled: false, groupSearchBase: [dc=ucera,dc=local], groupSearchScope: 2, groupObjectClass: group, groupSearchFilter: (CN=hwusers), extendedGroupSearchFilter: (&(objectclass=group)(CN=hwusers)(|(cn={0})(cn={1}))), extendedAllGroupsSearchFilter: (&(objectclass=group)(CN=hwusers)), groupMemberAttributeName: cn, groupNameAttribute: UID=hwusers,OU=groups,OU=HortonworksUsers,DC=ucera,DC=local, groupSearchAttributes: [UID=hwusers,OU=groups,OU=HortonworksUsers,DC=ucera,DC=local, cn], groupUserMapSyncEnabled: true, groupSearchFirstEnabled: false, userSearchEnabled: false, ldapReferral: ignore 06 Dec 2019 14:21:51 INFO UserGroupSync [UnixUserSyncThread] - Begin: initial load of user/group from source==>sink 06 Dec 2019 14:21:51 INFO LdapUserGroupBuilder [UnixUserSyncThread] - LDAPUserGroupBuilder updateSink started 06 Dec 2019 14:21:51 INFO LdapUserGroupBuilder [UnixUserSyncThread] - Performing user search first 06 Dec 2019 14:21:51 ERROR LdapUserGroupBuilder [UnixUserSyncThread] - LDAPUserGroupBuilder.getUsers() failed with exception: javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C09042F, comment: AcceptSecurityContext error, data 52e, v2580]; remaining name 'dc=ucera,dc=local' 06 Dec 2019 14:21:51 INFO LdapUserGroupBuilder [UnixUserSyncThread] - LDAPUserGroupBuilder.getUsers() user count: 0 06 Dec 2019 14:21:51 INFO UserGroupSync [UnixUserSyncThread] - End: initial load of user/group from source==>sink .... So it seems like Ranger is trying to use AD, encountering an error, and falling back to Unix based users. Did see this article (https://community.cloudera.com/t5/Community-Articles/Ranger-Ldap-Integration/ta-p/245494), but already have the cluster nodes linked to AD via SSSD, so would think the LDAP/AD sync should already be configured on the nodes and that Ranger should be able to use AD once the configs where entered. Any idea what is going on here? Any further debugging tips or information (very unfamiliar with AD/LDAP admin stuff)?

rvillanueva · ‎11-26-2019

After just giving in and trying to manually create the hive user myself, I see [root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive useradd: user 'hive' already exists [root@airflowetl ~]# cat /etc/passwd | grep hive [root@airflowetl ~]# id hive uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users) The fact that this existing user's uid looks like this and is not in the /etc/passwd file made me think that there is some existing Active Directory user (which this client node syncs with via installed SSSD) that already has the name hive. Checking our AD users, this turned out to be true. Temporarily stopping the SSSD service to stop sync with AD (service sssd stop) (since, not sure if you can get a server to ignore AD syncs on an individual user basis) before rerunning the client host add in Ambari fixed the problem for me.

rvillanueva · ‎11-26-2019

Adding some log printing lines near the offending final line in the error trace, ie. File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages, I print the code and stdout: 2 ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory So what the heck? It wants hdp-select to already be there, but ambari add-host UI complains if I manually install that binary myself beforehand. When I do manually install it (using the same repo file as in the rest of the existing cluster nodes) all I see is... 0 Packages: accumulo-client accumulo-gc accumulo-master accumulo-monitor accumulo-tablet accumulo-tracer atlas-client atlas-server beacon beacon-client beacon-server druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord druid-router druid-superset falcon-client falcon-server flume-server hadoop-client hadoop-hdfs-client hadoop-hdfs-datanode hadoop-hdfs-journalnode hadoop-hdfs-namenode hadoop-hdfs-nfs3 hadoop-hdfs-portmap hadoop-hdfs-secondarynamenode hadoop-hdfs-zkfc hadoop-httpfs hadoop-mapreduce-client hadoop-mapreduce-historyserver hadoop-yarn-client hadoop-yarn-nodemanager hadoop-yarn-registrydns hadoop-yarn-resourcemanager hadoop-yarn-timelinereader hadoop-yarn-timelineserver hbase-client hbase-master hbase-regionserver hive-client hive-metastore hive-server2 hive-server2-hive hive-server2-hive2 hive-webhcat hive_warehouse_connector kafka-broker knox-server livy-client livy-server livy2-client livy2-server mahout-client oozie-client oozie-server phoenix-client phoenix-server pig-client ranger-admin ranger-kms ranger-tagsync ranger-usersync shc slider-client spark-atlas-connector spark-client spark-historyserver spark-schema-registry spark-thriftserver spark2-client spark2-historyserver spark2-thriftserver spark_llap sqoop-client sqoop-server storm-client storm-nimbus storm-slider-client storm-supervisor superset tez-client zeppelin-server zookeeper-client zookeeper-serverAliases: accumulo-server all client hadoop-hdfs-server hadoop-mapreduce-server hadoop-yarn-server hive-server Command failed after 1 tries

Online	Offline
Last Visited	‎10-31-2020 09:19 PM

Member Since	‎07-11-2019 08:54 PM
Last Visited	‎10-31-2020 09:19 PM
Posts	102
Kudos received	4

Cloudera Community

Re: How to run spark-submit in virtualenv for pysp...

Re: LDAP/AD users not appearing in Ranger

Re: Ambari unable to run custom hook for modifying...

Re: Where are the spark2 binaries?

Re: What are HDFS NFS "access times"?

Previously working spark jobs only now throwing "j...

hadoop.proxyuser.root.groups config setting. What ...

Re: How to run spark-submit in virtualenv for pysp...

How to run spark-submit in virtualenv for pyspark?

Re: LDAP/AD users not appearing in Ranger

Re: LDAP/AD users not appearing in Ranger

Re: LDAP/AD users not appearing in Ranger

LDAP/AD users not appearing in Ranger

Re: Ambari unable to run custom hook for modifying...

Re: Ambari unable to run custom hook for modifying...