Member since
11-05-2018
18
Posts
0
Kudos Received
0
Solutions
10-14-2019
01:21 PM
I ran into the same error. Even though the dependencies are listed in sbt, the jars have to be specifically shipped with --jars option in spark-submit. Why is this needed? Any workarounds?
... View more
08-22-2019
01:50 PM
Yes - that WIP links back to KUDU-1603 that I shared earlier. Guess, we will have to wait it out. Thanks for your response.
... View more
08-15-2019
02:12 PM
Could you please share how KuduContext is created in PySpark? I am aware of KUDU-1603, but looking for workarounds and the weird java wrapper detailed in KUDU-1603 is not working as intended.
... View more
08-15-2019
10:03 AM
Hi
I have been searching for sometime for a command reference/API manual for PySpark-Kudu and I have been unsuccessful so far. Does Cloudera have something that can be of help?
Thanks.
... View more
Labels:
05-23-2019
12:58 AM
Hi
Does avro-tools support Kerberos authentication? If so, what is the syntax to make it work with a kerberos tkt/keytab etc?
$ avro-tools getschema hdfs://localhost:8020/ez/qa/rpt/acc_avro/part-m-00001.avro
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2170)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
at org.apache.avro.mapred.FsInput.<init>(FsInput.java:43)
at org.apache.avro.mapred.FsInput.<init>(FsInput.java:38)
at org.apache.avro.tool.Util.openSeekableFromFS(Util.java:110)
at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
at org.apache.avro.tool.Main.run(Main.java:85)
at org.apache.avro.tool.Main.main(Main.java:74)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:788)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2168)
... 10 more
Thanks.
... View more
03-22-2019
09:52 PM
Any update on this? Did you solve this by any means? I am looking at exact functionality in 2019, but, don't see any option other than sensitive data redaction option (don't think it's user/role based though).
... View more
01-31-2019
11:17 AM
Hello I have successfully set up the HDFS NFS Gateway in our cluster (CDH 5.15.x) and mounted the same in Windows 7 (mapped to a drive). NFS Host/Server - 1 node as of now. We are planning to add multiple nodes soon. The properties of the mount in Windows show below parameters. UID=-2, GID=-2
rsize=32768, wsize=32768
mount=soft, timeout=0.8
retry=3, locking=no
fileaccess=755, lang=ANSI
casesensitive=no
sec=sys Things are working fine w.r.t access, reading and writing etc. But, the perforance is really poor. For 860MB file, it is taking 8.5 minutes. Part of it (I think) is because of limited rsize/wsize which is limited to 32KB in WIN7. Are there any other performance improvements that could possibly be done to make this faster? Thanks.
... View more
Labels:
12-21-2018
02:44 PM
Hello
We have HDFS Encryption at rest enables in our Kerberized cluster.
I am able to create encryption zone, write data into it as admin (who created the key & zone). Other users (not in the same LDAP group as the above admin user) are not able to access it even with FACL's set to rwx - because they are not authorized for [DECRYPT_EEK].
Queries
1. When a user creates an encryption zone, by default, other users in same unix group gets access to [DECRYPT_EEK]. Is this true?
2. Usually an user is able to see the encrypted data (not the actual data) if he/she has read permissions to it. But, this is not the case with HDFS Rest encryption. Unless the user is able to decrypt the data, they are not allowed to read it. - correct?
3. Is there a way to show/display the encrypted data (without decrypting it)? If so, how?
4. What/where does GENERATE_EEK & GET_METADATA fit into this concept?
The whole concept of maintaining keys in KMS/KTS and encrypt the data based on that - seems to be more like a blocking the access to the data rather than the fact that the data is encrypted.
if someone can please provide some clarity, would be greatly beneficial. Thank you.
... View more
12-10-2018
09:46 AM
Thanks Tim. Would you be able to advice any easy way to capture the log messages from Impyla?
... View more
12-05-2018
12:13 PM
Also - please try with impala-shell port (e.g. 21051). It looks like 25003 for load balancer is used for external connections thru ODBC and 21051 for impala-shell connections.
... View more
12-05-2018
12:11 PM
Hi I am trying to get some documentation on what cursor/connection object methods are available in impyla? It does implement DB API 2.0 (PEP 249), but some of the methods like rowcount, messages, errorhandler are optional in when implementing DB API. Were any optional DB API Entensions included in Impyla? If not, is there any documentation on what is available? Also - can someone please advice on how to capture the logs when using impyla??
... View more
Labels:
12-05-2018
11:24 AM
Any solution for this problem?
... View more
12-04-2018
09:00 AM
Hi I understand that I can apply the patch and it could solve the issue. My question is, doesn't Cloudera apply these patches for the tool that comes with CDH? My understanding was that even though version number is old, Cloudera backports the security and other features/patches that do not break backward compatibility.
... View more
12-03-2018
09:50 AM
Tried Double quotes - same error. --target-dir should work as a workaround. Does that mean the patch mentioned is not part of Sqoop1 in CDH 5.15.x ?
... View more
11-30-2018
03:04 PM
Hello I am trying to import tables from MSSQL server and tables names can have special character in them (e.g. -) e.g. table-name Sqoop version: sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.15.0 Here is the command. sqoop import \
--connect 'jdbc:sqlserver://TSTSQL; databaseName=dbname' \
--username sqoop --password **** \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table test-sql_table_name \
--hive-import --hive-overwrite \
--delete-target-dir \
--hive-table temp.test_sql_table_name \
--hive-drop-import-delims --fields-terminated-by '|' \
--null-string '\\N' \
--num-mappers 1 Error: 18/11/30 14:50:37 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '-'.
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '-'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1655)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:440)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:385)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7505)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2445)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:191)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:166)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:297)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:777)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1858)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1657)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
18/11/30 14:50:37 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1663)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252) There is a patch available for this - https://issues.apache.org/jira/browse/SQOOP-521 But, it looks like Cloudera's Sqoop1 doesn't have this patch applied? Can someone help? Thanks. BTW - escaping the table name within [] works, but data is not imported because of below error. FAILED: SemanticException Line 2:17 Invalid path ''hdfs://nameservice1/user/<userid>/[test-table_name]'': No files matching path hdfs://nameservice1/user/<userid>/%5Btest-table_name%5D
... View more
Labels:
11-05-2018
11:03 AM
Hello
We have a cluster Kerberized cluster with CDH 5.15.0 with Sentry enabled, Integrated with LDAP, using Kerberos that exist on or managed by the LDAP/AD.
I am trying to create personal Hive DB's for which only that user has access to objects under that DB. Facing problem when providing/restricting access to a single user in same LDAP group.
In Hue user admin, am only able to grant/restrict permission for a LDAP group and not for an individual user.
We have 4-5 users in same LDAP group for whom I am trying to create personal Hive DB's under their own HDFS home directory as default location (/user/user1).
Steps:
1. Created a group caled (user1_group) in Hue Admin Groups (for user1).
2. Selected all permissions except useradmin.access and user1 as member.
3. Created a role in Hue --> Security --> Hive tabled --> Roles and selected user1_group which only has 1 user in it.
3. Created a new Hive DB (user1db) with default location as /user/user1 (HDFS path)
4. Added privelages - for the above role (from #3) with db=user1db --> table=ALL
Just with above steps, user1 should be able to see the newly created DB under their Hue/Hive or Impala (after metadata refresh). But, they are not able to.
So, I changed the role (from #3) to reflect the LDAP group (ldap_group1) which user1 belongs to. Then, user1 is able to view the DB.
5. When the user tried to create a table - he/she gets the below error.
user=hive, access=WRITE, inode="/user/user1":user1:user1:drwxr-xr-x ...."
6. Executed the below command so that hive gets access to inode above.
hdfs dfs -setfacl -R -m user:hive:rwx /user/user1
7. User1 is able to create the table and perform various operations.
The problem here is, any user under LDAP group (ldap_group1) who has permission to impersonate as hive or impala is able to create/delete tables in db_user1.
How can I restrict access to personal DB's only to that user without others having access to it?
What am I doing incorrectly in the above steps?
Thanks for the input/pointers.
... View more