Member since
04-11-2016
174
Posts
29
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3393 | 06-28-2017 12:24 PM | |
2557 | 06-09-2017 07:20 AM | |
7102 | 08-18-2016 11:39 AM | |
5275 | 08-12-2016 09:05 AM | |
5436 | 08-09-2016 09:24 AM |
08-16-2016
09:17 AM
Edited the original question to include the sqoop import command(in ORC format) that I have used, can you check ?
... View more
08-16-2016
09:17 AM
Edited the original question to include the sqoop import command(in ORC format) that I have used, can you check ?
... View more
08-15-2016
03:02 PM
This is the first and one-time import.
... View more
08-15-2016
02:10 PM
1 Kudo
HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 The source tables are in a SQL Server schema, below is a table with 1205028380 rows and a composite PK (DateDimensionId, DriverDimensionId, VehicleDimensionId) : DateDimensionId bigint Unchecked
DriverDimensionId int Unchecked
VehicleDimensionId int Unchecked
Odometer bigint Checked
TotalFuel bigint Checked
TotalFuelIdle bigint Checked
TotalRuntime bigint Checked
TotalRuntimeIdle bigint Checked
TotalDistanceWithTrailer bigint Checked
TotalFuelPTO bigint Checked
TotalRuntimePTO bigint Checked
TotalTimeOverspeeding bigint Checked
TotalTimeOverreving bigint Checked
TotalNoOfHarshBrakes bigint Checked
TotalNoOfBrakeApplications bigint Checked
TotalNoOfHarshAcceleration bigint Checked
MinTimeMessage datetime2(7)Checked
MaxTimeMessage datetime2(7)Checked
TimeOutOfGreenBandDriving bigint Checked
Coasting bigint Checked
.
.
. I used the following command, note that the format is ORC, also can '--num-mappers' cause any duplication ? sqoop import --num-mappers 8 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FMS_FleetManagementDatawarehouse_VehicleData --hcatalog-table DateVehicleDriverAggregate --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagementDatawarehouse' --username --password --table DateVehicleDriverAggregate -- --schema VehicleData The Sqoop import job took a long time(5.6h) with the default 4 mappers but the concern is that it imported 1218843487 records, more than the source ! Is the composite key causing some issue or is it something else ? There were no errors in the job but in case any specific logs are required, I can provide.
... View more
Labels:
- Labels:
-
Apache Sqoop
08-12-2016
09:05 AM
1 Kudo
Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked : Did an 'su hive' Executed the following command(probably, the -- --schema should be the last arg, Sqoop simply ignores/breaks after that!) sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table DatabaseLog -- --schema ape
... View more
08-11-2016
07:04 AM
I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ?
... View more
08-11-2016
07:03 AM
Thanks for the inputs, now I am a bit clear. I still have a doubt - if importing to Hive and HCatalog is kinda synonymous i.e import in one makes the tables available in the other, why would anyone IMPORT to HCatalog at all, why not import to Hive ONLY ? The only advantage I see is that an HCatalog import gives you an option to store the table in ORC, can you validate my assumption ?
... View more
08-11-2016
07:00 AM
As I mentioned, the only place where I see the data is under my user's hdfs dir. - neither in HCatalog nor in Hive, hence, can you elaborate on 'relax your permission on the data you sqooped over' ? I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ?
... View more
08-10-2016
01:23 PM
HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). As suggested by the community, I decided to import in HCatalog. Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption). Questions :
If imported to Hive directly, will the table be available to Pig, MR ? If imported to HCatalog, what needs to be done for accessing via Hive ? Is pre-creation of tables in Hive necessary ? If yes, what is the advantage of the importing in HCatalog, (I can import in Hive directly)/(import in HDFS and then create external table) ? Issue : I wish to achieve the following in one step, I'm unsure if it's possible but threads like this and this deem so :
Import of the data(from SQL Server tables) Store this data at custom HDFS path say /org/data/schema1, /org/data/schema2 and so on(is this impossible because Sqoop says it (--target-dir/--warehouse-dir) Avoid 'pre-creating' or writing create statements for those tables(there are 100s of them) Store the table in ORC format I executed the following command : -bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table SettingAttribute -- --schema Administration --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" The source table contains 109 records and those are fetched : 16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
16/08/10 15:02:41 INFO mapreduce.Job: map 0% reduce 0%
16/08/10 15:02:47 INFO mapreduce.Job: map 100% reduce 0%
16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=616636
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=540
HDFS: Number of bytes written=10079
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=16132
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=16132
Total vcore-seconds taken by all map tasks=16132
Total megabyte-seconds taken by all map tasks=66076672
Map-Reduce Framework
Map input records=109
Map output records=109
Input split bytes=540
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=320
CPU time spent (ms)=6340
Physical memory (bytes) snapshot=999870464
Virtual memory (bytes) snapshot=21872697344
Total committed heap usage (bytes)=943194112
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=10079
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records. The files are created under my user : hdfs dfs -ls /user/ojoqcu/SettingAttribute
Found 5 items
-rw------- 3 ojoqcu hdfs 0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
-rw------- 3 ojoqcu hdfs 8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
-rw------- 3 ojoqcu hdfs 144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
-rw------- 3 ojoqcu hdfs 1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
-rw------- 3 ojoqcu hdfs 434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003 I cannot see anything in HCatalog(nor in Hive) -bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
WARNING: Use "yarn jar" to launch YARN applications.
16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
OK
Time taken: 2.007 seconds @Ravi Mutyala I guess you have considerable experience here, I have proceeded based on the inputs you provided earlier 🙂
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Hive
-
Apache Sqoop
08-10-2016
11:25 AM
While I am grateful for your answer, I'm unsure whether I'm stupid or lazy(or both!) 😉 Just in case someone needs in the future : sqoop list-tables --connect 'jdbc:sqlserver://<hostname/IP>;database=<db-name>' --username <user-name> --password <password> -- --schema <schema-name>
... View more