Member since
03-11-2016
36
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
745 | 10-06-2016 03:26 PM |
09-16-2020
03:27 PM
I believe this will fail if you stop your job today and run it tomorrow.. now will change to other day and you will miss the data...
... View more
06-18-2018
08:17 PM
I have a sample dataset like below:- sample=[(201406,'c',100),(201406,'e',200),(201406,'a',300),(201407,'c',100),(201407,'d',300),(201407,'e',500)]
samplerdd=sc.parallelize(sample) I was able to get top N records per group through groupByKey() like below which gave me below output:- samplerdd.map(lambda rec:((rec[0]),(rec[2],rec[1]))).groupByKey().map(lambda rec:((rec[0]),sorted(rec[1],reverse=True)[:2])).collect()
[(201406, [(300, 'a'), (200, 'e')]), (201407, [(500, 'e'), (300, 'd')])] But how to achive same through other Key API's and what should be the best in case of huge data to avoid shuffle operations?. Does other API's guarantee global ordering?
... View more
Labels:
- Labels:
-
Apache Spark
05-12-2018
03:09 PM
@Geoffrey Shelton Okot Hi Geoffrey , any thoughts for above to clear my doubts.
... View more
05-08-2018
04:59 PM
We have a working structure where openldap was used for authentication with below structure,
along with ranger and knox. Sample structure for existent ldap:- openldap root:-
dn: dc=abchadoop,dc=com,dc=za
Subtree inside openldap like below:-
Groups/Services:-
dn: ou=people,dc=abchadoop,dc=com,dc=za
dn: ou=groups,dc=abchadoop,dc=com,dc=za
dn: ou=services,dc=abchadoop,dc=com,dc=za
...
Users:-
dn: cn=ud_devtest,ou=people,dc=abchadoop,dc=com,dc=za
dn: cn=hcat,ou=services,dc=abchadoop,dc=com,dc=za
... Now we added kerberos with openldap(same server) as backend and then picture became dirty with lot of confusions.
After adding kerberos , new entries are like below in openldap:- openldap root:-
dn: dc=abchadoop,dc=com,dc=za
Subtree inside openldap like below:-
dn: ou=people,dc=abchadoop,dc=com,dc=za
dn: ou=groups,dc=abchadoop,dc=com,dc=za
dn: ou=services,dc=abchadoop,dc=com,dc=za
dn: cn=ud_devtest,ou=people,dc=abchadoop,dc=com,dc=za
dn: cn=hcat,ou=services,dc=abchadoop,dc=com,dc=za
dn: cn=hive_dev,ou=groups,dc=abchadoop,dc=com,dc=za
kerberos:-
dn: cn=kerberos,dc=abchadoop,dc=com,dc=za
dn: cn=ABCHDP.COM,cn=kerberos,dc=abchadoop,dc=com,dc=za
--
Hadoop kerberos principals
--
dn: krbPrincipalName=ud_dvjones@ABCHDP.COM,cn=ABCHDP.COM,cn=kerberos,dc=abchadoop,dc=com,dc=za
dn: krbPrincipalName=ud_devtest@ABCHDP.COM,cn=ABCHDP.COM,cn=kerberos,dc=abchadoop,dc=com,dc=za
queries:- Do I need to have realm same as abchadoop.com.za as in most of the example it is same as example.com for both openldap root and kerberos realm?? ud_devtest was already existing with password1 when openldap was created and after kerberos installation we created one more principal with same name(as it was using cli) but here we have give different password (password2) , is there any way to sync password?? Going forward where should i create users , ldap or kerberos for dev user who will use cli and HS2 ?? Business user who will only use HS2 ?? Can someone help me in understand.
... View more
Labels:
- Labels:
-
Apache Knox
-
Apache Ranger
05-06-2018
08:11 PM
@Geoffrey Shelton Okot But i am not able to login through hive cli , I will collect the debug log tomorrow and will attach.Can you please verfiy my attachment if you sees something wrong there...
... View more
05-06-2018
07:57 PM
Yes i am still getting error while using hive cli , i will eventually switch to beeline but still not able to understand why it is not taking the user, even after doing the open ldap group mappings as well...
... View more
05-06-2018
06:39 PM
sample-file.txt
... View more
05-06-2018
06:39 PM
@Geoffrey Shelton Okot Yes this user can grab a ticket and run hdfs commands. After grabbing ticket even through hiveserver2 it can access the tables and query using beeline:- ud_dvjones@ABC-server-16> beeline -u "jdbc:hive2://c1master01-nn.abc.corp:10000/;principal=hive/c1master01-nn.stc.corp@ABCHDP.COM" attaching the file which has openldap and kerberos entries with auth_to_local rules
... View more
05-06-2018
04:55 PM
Hi @Geoffrey Shelton Okot I am using MIT kerberos with openldap and I have not changed the auth_to_local rules as i done kerberos setup for hadoop cluster through Ambari and below are the rules.. I have also cross checked with below property and it is transating correctly:- ud_dvjones@ABC-server-16> hadoop org.apache.hadoop.security.HadoopKerberosName ud_dvjones@ABCHDP.COM
Name: ud_dvjones@ABCHDP.COM to ud_dvjones
... View more
05-06-2018
07:07 AM
I have enable kerberos on our HDP2.6 cluster , for old users i am able to access hive cli as they exists on all nodes.
But when i created a new user on edge node and tried to access hive cli i am getting below error. I am using kerberos with openldap as backend as in future i am planning to use ranger+knox , so i create this user in ldap and enabled hdfs group mappings.
When i am doing hdfs groups , i am getting its groups info from ldap as well >>hdfs groups
op:-
hadoop03 hadoop_base03 Does hadoop group mappings works with kerberos setup or need to have SSSD for OS user mappings with LDAP.??
Do i need to create every user on namenodes for kerberos?? ud_dvjones@ABC-server-16:/root> hive
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1525177646465_0010 failed 2 times due to AM Container for appattempt_1525177646465_0010_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: < > Then click on links to logs of each attempt.
Diagnostics: Application application_1525177646465_0010 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is ud_dvjones
main : requested yarn user is ud_dvjones
User ud_dvjones not found
Failing this attempt. Failing the application.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:560)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1525177646465_0010 failed 2 times due to AM Container for appattempt_1525177646465_0010_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: < > Then click on links to logs of each attempt.
Diagnostics: Application application_1525177646465_0010 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is ud_dvjones
main : requested yarn user is ud_dvjones
User ud_dvjones not found
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:557)
... 8 more
... View more
Labels:
- Labels:
-
Apache Hive