Member since
04-11-2016
174
Posts
29
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3399 | 06-28-2017 12:24 PM | |
2575 | 06-09-2017 07:20 AM | |
7140 | 08-18-2016 11:39 AM | |
5336 | 08-12-2016 09:05 AM | |
5503 | 08-09-2016 09:24 AM |
08-09-2016
02:51 PM
TABLE_CATALOG TABLE_SCHEMA TABLE_NAME
Management Administration SettingAttribute
Management Administration SettingAttributeGroup
Management Administration SettingAttributeValue
Management Administration SettingValue
Management ape DatabaseScriptLog
Management ape DatabaseLog
Management Common Language
Management Common ThirdPartyType
Management Common Country
Management Company DistributorCow
Management Company CustomerSetting
Management Company CustomerSettingAttributeValue The above output is of the following query executed on the SQL Server(note 'dbo' is excluded) : USE Management
GO
SELECT * FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA NOT IN ('dbo') order by TABLE_SCHEMA; HDP-2.4.2.0-258 installed using Ambari 2.2.2.0, Sqoop 1.4.6.2.4.2.0-258. Now when I do a normal sqoop list-tables with 'database=Management' in the --connect string, I get the tables that are part of the dbo tables. As per the Sqoop documentation for Microsoft SQL Connector, I tried using the --schema option(it's position in the command didn't seem to make any difference!) but it doesn't compile : sqoop list-tables --connect 'jdbc:sqlserver://IP;database=Management' --username __ --password __ --schema Administration --verbose
16/08/09 16:44:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950
16/08/09 16:44:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Error parsing arguments for list-tables:
16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: --schema
16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: Administration
16/08/09 16:44:32 ERROR tool.BaseSqoopTool: Unrecognized argument: --verbose My ultimate objective is to import the tables as hcatalog tables but NO perm-comb to specify non-default schema works ! e.g: sqoop import --connect 'jdbc:sqlserver://IP;database=Management' --username __ --password __ --table "Administration.SettingAttribute" --hcatalog-database Management --hcatalog-table Administration_SettingAttribute_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
16/08/09 16:51:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950
16/08/09 16:51:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/09 16:51:54 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/09 16:51:54 INFO tool.CodeGenTool: Beginning code generation
16/08/09 16:51:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration.SettingAttribute] AS t WHERE 1=0
16/08/09 16:51:54 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'Administration.SettingAttribute'.
com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'Administration.SettingAttribute'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:404)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:285)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1845)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
16/08/09 16:51:54 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Sqoop
08-09-2016
02:30 PM
Yeah but what should I do about the files already in trash and expunge doesn't help ?
... View more
08-09-2016
09:24 AM
1 Kudo
Yeah I expunged but does it mean that the space reclaim will start 360 min.(6h) after deletion of the file ? /*EDIT added after the space was auto-reclaimed*/ It seems strange but the space was reclaimed when I checked it today, probably, the reclaim did start after 6h 😞
... View more
08-09-2016
09:18 AM
HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I executed the TestDFSIO on the cluster but it failed midway. The HDFS then had loads of data, the HDFS utilization was/is shown 98% in Ambari. I simply deleted the benchmark directory created during the TestDFSIO AND expunged : [hdfs@l4377t root]$ hdfs dfs -ls /benchmarks
Found 1 items
drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO
[hdfs@l4377t root]$
[hdfs@l4377t root]$
[hdfs@l4377t root]$ hdfs dfs -ls -h /benchmarks
Found 1 items
drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO
[hdfs@l4377t root]$
[hdfs@l4377t root]$ hdfs dfs -ls -h /benchmarks/TestDFSIO
Found 3 items
drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:48 /benchmarks/TestDFSIO/io_control
drwxr-xr-x - hdfs hdfs 0 2016-08-08 12:58 /benchmarks/TestDFSIO/io_data
drwx--x--x - hdfs hdfs 0 2016-08-08 13:02 /benchmarks/TestDFSIO/io_write
[hdfs@l4377t root]$
[hdfs@l4377t root]$ hdfs dfs -rmr /benchmarks/
rmr: DEPRECATED: Please use 'rm -r' instead.
16/08/09 09:15:33 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://l4283t.sss.com:8020/benchmarks' to trash at: hdfs://l4283t.sss.com:8020/user/hdfs/.Trash/Current
[hdfs@l4377t root]$
[hdfs@l4377t root]$
[hdfs@l4377t root]$ hdfs dfs -expunge
16/08/09 09:16:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.
16/08/09 09:16:13 INFO fs.TrashPolicyDefault: Created trash checkpoint: /user/hdfs/.Trash/160809091613 However, the disk and HDFS space is still not free, below is the df -h for the datanode directories: /dev/vdc 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdc
/dev/vdk 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdk
/dev/vdl 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdl
/dev/vdh 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdh
/dev/vdg 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdg
/dev/vdj 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdj
/dev/vdi 10G 8.9G 1.2G 89% /opt/hdfsdisks/vdi
/dev/vde 10G 9.0G 1.1G 90% /opt/hdfsdisks/vde
/dev/vdd 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdd
/dev/vdb 10G 8.9G 1.1G 90% /opt/hdfsdisks/vdb
/dev/vdm 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdm
/dev/vdf 10G 9.0G 1.1G 90% /opt/hdfsdisks/vdf EDIT(@Benjamin Leonhardi can you check this ?): Output for those directories AFTER expunge I tried expunge earlier as well as now, however, those TestDFSIO files keep on surfacing in trash ! -rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_28
-rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_26
-rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_25
-rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/TestDFSIO/io_data/test_io_24
-rw------- 3 hdfs hdfs 0 2016-08-08 13:57 /user/hdfs/.Trash/160809091613/benchmarks/Te
... View more
Labels:
- Labels:
-
Apache Hadoop
07-07-2016
12:50 PM
Can you check if I have understood correctly :
Sqoop import(with HCatalog integration) to Hive Use HCatalog in case someone needs to access and process the data in Pig, MR etc. I came across the following paragraph in O'Reilly(and the same tone reflected in several posts on the Internet) A drawback of ORC as of this writing is that it was designed specifically for Hive, and
so is not a general-purpose storage format that can be used with non-Hive MapReduce
interfaces such as Pig or Java, or other query engines such as Impala. Work is
under way to address these shortcomings, though There will be several RDBMS schemas that will be imported onto HDFS and LATER partitioned etc. and processed. In this context, can you elaborate 'Once the initial data import is in Hive as ORC, you can then still continue and transform this data as necessary.' I have the following questions :
Suppose Sqoop import to Hive is done WITHOUT partitions(--hive-partition-key) i.e all tables are Hive 'Managed' tables and , say, this uses 800GB of HDFS space as compared to 1TB in the source RDBMS. The question now is won't more space be occupied when I try to create PARTITIONED tables? Will it be possible for some third-party non-java tool to read the data by relying on HCatalog ?
... View more
07-07-2016
09:20 AM
1 Kudo
HDP 2.4 installed using Ambari 2.2.2.0. To my previous question, I received comprehensive feedback from the community based on which I assumed that importing data from RDBMS to HDFS(text/Avro) and then create Hive external tables. Then I realized that I have missed/misinterpreted something :
The ideas behind importing first to HDFS are :
When stored on HDFS, Hive as well as other tools(Pig, MR) and external/third-party tools can access the files and process in their own ways Sqoop cannot directly create the EXTERNAL tables, moreover, it is required that you load the data first onto the cluster and after some period, PARTITION the tables(when the db developers are available for business knowledge) A 1TB RDBMS is imported as text/Avro files onto HDFS, this will occupy approx. 3TB on the HDFS(given the replication factor of 3) Creating a Hive EXTERNAL table is NOT going to consume much HDFS space, I created 'raw/plain' EXTERNAL tables that merely point to the imported files NOW the confusion begins - I need to create EXTERNAL PARTITIONED tables from these 'raw/plain' tables. Now, the final EXTERNAL PARTITIONED tables will again occupy space, also due to the point 1.1, we CANNOT delete the original imported files. This will lead to more consumption of HDFS space, due to duplication of data Are my fears justified ? If yes, how shall I proceed ? If not, what am I missing(say, HCatalog usage) ?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
07-07-2016
08:30 AM
@emaxwell Oozie workflow mgt. isn't required right away but the Ambari views for Hive etc. would be. I have the following questions : All the views can be accessed AFTER logging in Ambari - what approach should be taken to make these views available to end-users registered in some corporate LDAP groups ? I can see the Hive but NOT the Pig view. Is it that it needs to be configured manually to be visible ? Nothing happens when I click on the Files view ? Does this need to be configured separately ? If this view exists, what is the use of the NN UI -> Utilities -> Browse file system ?
... View more
07-05-2016
03:37 PM
HDP 2.4 installed using Ambari 2.2.2.0. I followed the steps in the documentation, however I have hit the follow error : [root@l4377t ~]# cd /usr/lib/hue
[root@l4377t hue]#
[root@l4377t hue]#
[root@l4377t hue]# source build/env/bin/activate
(env)[root@l4377t hue]# pip install psycopg2
python2.6: error while loading shared libraries: libpython2.6.so.1.0: cannot open shared object file: No such file or directory As mentioned in this thread, Hue was NOT supported on RHEL/CentOS 7 - is this still valid ? If yes, how shall I proceed with the Hue installation ? Are there any alternatives in HDP for Hue ?
... View more
Labels:
06-28-2016
03:06 PM
I got the point of HDFS POSIX permissions, however, I couldn't understand 'HDFS ACLs implemented outside of Ranger' - does this mean one that ACL and Ranger are 'mutually exclusive' ? If yes, what is that ACL is doing which Ranger cannot ? Can you check this community thread which suggests that if you use Ranger, you need not work with ACL.
... View more
06-28-2016
01:25 PM
That's awful - if there are 100 users per service, those many policies per service need to be created. Is there something being missed or some better ways to do it ?
... View more