About kaliyugantagoni

kaliyugantagoni · ‎08-09-2016

TABLE_CATALOG TABLE_SCHEMA TABLE_NAME Management Administration SettingAttribute Management Administration SettingAttributeGroup Management Administration SettingAttributeValue Management Administration SettingValue Management ape DatabaseScriptLog Management ape DatabaseLog Management Common Language Management Common ThirdPartyType Management Common Country Management Company DistributorCow Management Company CustomerSetting Management Company CustomerSettingAttributeValue The above output is of the following query executed on the SQL Server(note 'dbo' is excluded) : USE Management GO SELECT * FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA NOT IN ('dbo') order by TABLE_SCHEMA; HDP-2.4.2.0-258 installed using Ambari 2.2.2.0, Sqoop 1.4.6.2.4.2.0-258. Now when I do a normal sqoop list-tables with 'database=Management' in the --connect string, I get the tables that are part of the dbo tables. As per the Sqoop documentation for Microsoft SQL Connector, I tried using the --schema option(it's position in the command didn't seem to make any difference!) but it doesn't compile : sqoop list-tables --connect 'jdbc:sqlserver://IP;database=Management' --username __ --password __ --schema Administration --verbose 16/08/09 16:44:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/08/09 16:44:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Error parsing arguments for list-tables: 16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: --schema 16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: Administration 16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: --verbose My ultimate objective is to import the tables as hcatalog tables but NO perm-comb to specify non-default schema works ! e.g: sqoop import --connect 'jdbc:sqlserver://IP;database=Management' --username __ --password __ --table "Administration.SettingAttribute" --hcatalog-database Management --hcatalog-table Administration_SettingAttribute_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" 16/08/09 16:51:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/08/09 16:51:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/08/09 16:51:54 INFO manager.SqlManager: Using default fetchSize of 1000 16/08/09 16:51:54 INFO tool.CodeGenTool: Beginning code generation 16/08/09 16:51:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration.SettingAttribute] AS t WHERE 1=0 16/08/09 16:51:54 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'Administration.SettingAttribute'. com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'Administration.SettingAttribute'. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216) at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:285) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1845) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) 16/08/09 16:51:54 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

kaliyugantagoni · ‎08-09-2016

Yeah but what should I do about the files already in trash and expunge doesn't help ?

kaliyugantagoni · ‎08-09-2016

Yeah I expunged but does it mean that the space reclaim will start 360 min.(6h) after deletion of the file ? /*EDIT added after the space was auto-reclaimed*/ It seems strange but the space was reclaimed when I checked it today, probably, the reclaim did start after 6h 😞

kaliyugantagoni · ‎08-09-2016

HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I executed the TestDFSIO on the cluster but it failed midway. The HDFS then had loads of data, the HDFS utilization was/is shown 98% in Ambari. I simply deleted the benchmark directory created during the TestDFSIO AND expunged : [hdfs@l4377t root]$ hdfs dfs -ls /benchmarks Found 1 items drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO [hdfs@l4377t root]$ [hdfs@l4377t root]$ [hdfs@l4377t root]$ hdfs dfs -ls -h /benchmarks Found 1 items drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO [hdfs@l4377t root]$ [hdfs@l4377t root]$ hdfs dfs -ls -h /benchmarks/TestDFSIO Found 3 items drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO/io_control drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:58 /benchmarks/TestDFSIO/io_data drwx--x--x - hdfs hdfs 0 2016-08-08 13:02 /benchmarks/TestDFSIO/io_write [hdfs@l4377t root]$ [hdfs@l4377t root]$ hdfs dfs -rmr /benchmarks/ rmr: DEPRECATED: Please use 'rm -r' instead. 16/08/09 09:15:33 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://l4283t.sss.com:8020/benchmarks' to trash at: hdfs://l4283t.sss.com:8020/user/hdfs/.Trash/Current [hdfs@l4377t root]$ [hdfs@l4377t root]$ [hdfs@l4377t root]$ hdfs dfs -expunge 16/08/09 09:16:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes. 16/08/09 09:16:13 INFO fs.TrashPolicyDefault: Created trash checkpoint: /user/hdfs/.Trash/160809091613 However, the disk and HDFS space is still not free, below is the df -h for the datanode directories: /dev/vdc 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdc /dev/vdk 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdk /dev/vdl 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdl /dev/vdh 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdh /dev/vdg 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdg /dev/vdj 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdj /dev/vdi 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdi /dev/vde 10G 9.0G 1.1G 90% /opt/hdfsdisks/vde /dev/vdd 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdd /dev/vdb 10G 8.9G 1.1G 90% /opt/hdfsdisks/vdb /dev/vdm 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdm /dev/vdf 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdf EDIT(@Benjamin Leonhardi can you check this ?): Output for those directories AFTER expunge I tried expunge earlier as well as now, however, those TestDFSIO files keep on surfacing in trash ! -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_28 -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_26 -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_25 -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_24 -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/Te

kaliyugantagoni · ‎07-07-2016

Can you check if I have understood correctly : Sqoop import(with HCatalog integration) to Hive Use HCatalog in case someone needs to access and process the data in Pig, MR etc. I came across the following paragraph in O'Reilly(and the same tone reflected in several posts on the Internet) A drawback of ORC as of this writing is that it was designed specifically for Hive, and so is not a general-purpose storage format that can be used with non-Hive MapReduce interfaces such as Pig or Java, or other query engines such as Impala. Work is under way to address these shortcomings, though There will be several RDBMS schemas that will be imported onto HDFS and LATER partitioned etc. and processed. In this context, can you elaborate 'Once the initial data import is in Hive as ORC, you can then still continue and transform this data as necessary.' I have the following questions : Suppose Sqoop import to Hive is done WITHOUT partitions(--hive-partition-key) i.e all tables are Hive 'Managed' tables and , say, this uses 800GB of HDFS space as compared to 1TB in the source RDBMS. The question now is won't more space be occupied when I try to create PARTITIONED tables? Will it be possible for some third-party non-java tool to read the data by relying on HCatalog ?

kaliyugantagoni · ‎07-07-2016

HDP 2.4 installed using Ambari 2.2.2.0. To my previous question, I received comprehensive feedback from the community based on which I assumed that importing data from RDBMS to HDFS(text/Avro) and then create Hive external tables. Then I realized that I have missed/misinterpreted something : The ideas behind importing first to HDFS are : When stored on HDFS, Hive as well as other tools(Pig, MR) and external/third-party tools can access the files and process in their own ways Sqoop cannot directly create the EXTERNAL tables, moreover, it is required that you load the data first onto the cluster and after some period, PARTITION the tables(when the db developers are available for business knowledge) A 1TB RDBMS is imported as text/Avro files onto HDFS, this will occupy approx. 3TB on the HDFS(given the replication factor of 3) Creating a Hive EXTERNAL table is NOT going to consume much HDFS space, I created 'raw/plain' EXTERNAL tables that merely point to the imported files NOW the confusion begins - I need to create EXTERNAL PARTITIONED tables from these 'raw/plain' tables. Now, the final EXTERNAL PARTITIONED tables will again occupy space, also due to the point 1.1, we CANNOT delete the original imported files. This will lead to more consumption of HDFS space, due to duplication of data Are my fears justified ? If yes, how shall I proceed ? If not, what am I missing(say, HCatalog usage) ?

kaliyugantagoni · ‎07-07-2016

@emaxwell Oozie workflow mgt. isn't required right away but the Ambari views for Hive etc. would be. I have the following questions : All the views can be accessed AFTER logging in Ambari - what approach should be taken to make these views available to end-users registered in some corporate LDAP groups ? I can see the Hive but NOT the Pig view. Is it that it needs to be configured manually to be visible ? Nothing happens when I click on the Files view ? Does this need to be configured separately ? If this view exists, what is the use of the NN UI -> Utilities -> Browse file system ?

kaliyugantagoni · ‎07-05-2016

HDP 2.4 installed using Ambari 2.2.2.0. I followed the steps in the documentation, however I have hit the follow error : [root@l4377t ~]# cd /usr/lib/hue [root@l4377t hue]# [root@l4377t hue]# [root@l4377t hue]# source build/env/bin/activate (env)[root@l4377t hue]# pip install psycopg2 python2.6: error while loading shared libraries: libpython2.6.so.1.0: cannot open shared object file: No such file or directory As mentioned in this thread, Hue was NOT supported on RHEL/CentOS 7 - is this still valid ? If yes, how shall I proceed with the Hue installation ? Are there any alternatives in HDP for Hue ?

kaliyugantagoni · ‎06-28-2016

I got the point of HDFS POSIX permissions, however, I couldn't understand 'HDFS ACLs implemented outside of Ranger' - does this mean one that ACL and Ranger are 'mutually exclusive' ? If yes, what is that ACL is doing which Ranger cannot ? Can you check this community thread which suggests that if you use Ranger, you need not work with ACL.

kaliyugantagoni · ‎06-28-2016

That's awful - if there are 100 users per service, those many policies per service need to be created. Is there something being missed or some better ways to do it ?

Online	Offline
Last Visited	‎03-18-2020 10:21 AM

Member Since	‎04-11-2016 02:31 PM
Last Visited	‎03-18-2020 10:21 AM
Posts	174
Kudos received	29

Cloudera Community

Re: NiFi custom processor custom log, logging in t...

Re: Separate log file for custom processor

Re: Sqoop imported more records than source

Re: Sqoop import to HCatalog/Hive - table not visi...

Re: HDFS Space not reclaimed

Sqoop import SQL Server NON-DEFAULT schema

Re: HDFS Space not reclaimed

Re: HDFS Space not reclaimed

HDFS Space not reclaimed

Re: Revisited : Import to Hive or HDFS ?

Revisited : Import to Hive or HDFS ?

Re: Hue installation HDP 2.4 and RHEL7

Hue installation HDP 2.4 and RHEL7

Re: HDFS Policy 'resource path' with placeholder -...

Re: HDFS Policy 'resource path' with placeholder -...