Member since 
    
	
		
		
		04-11-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                174
            
            
                Posts
            
        
                29
            
            
                Kudos Received
            
        
                6
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3960 | 06-28-2017 12:24 PM | |
| 3047 | 06-09-2017 07:20 AM | |
| 8118 | 08-18-2016 11:39 AM | |
| 6516 | 08-12-2016 09:05 AM | |
| 6511 | 08-09-2016 09:24 AM | 
			
    
	
		
		
		08-16-2016
	
		
		09:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Edited the original question to include the sqoop import command(in ORC format) that I have used, can you check ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-16-2016
	
		
		09:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Edited the original question to include the sqoop import command(in ORC format) that I have used, can you check ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2016
	
		
		03:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This is the first and one-time import. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2016
	
		
		02:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 HDP-2.4.2.0-258 installed using Ambari 2.2.2.0  The source tables are in a SQL Server schema, below is a table with 1205028380 rows and a composite PK (DateDimensionId, DriverDimensionId, VehicleDimensionId) :  DateDimensionId			bigint	Unchecked
DriverDimensionId		int	Unchecked
VehicleDimensionId		int	Unchecked
Odometer			bigint	Checked
TotalFuel			bigint	Checked
TotalFuelIdle			bigint	Checked
TotalRuntime			bigint	Checked
TotalRuntimeIdle		bigint	Checked
TotalDistanceWithTrailer	bigint	Checked
TotalFuelPTO			bigint	Checked
TotalRuntimePTO			bigint	Checked
TotalTimeOverspeeding		bigint	Checked
TotalTimeOverreving		bigint	Checked
TotalNoOfHarshBrakes		bigint	Checked
TotalNoOfBrakeApplications	bigint	Checked
TotalNoOfHarshAcceleration	bigint	Checked
MinTimeMessage			datetime2(7)Checked
MaxTimeMessage			datetime2(7)Checked
TimeOutOfGreenBandDriving	bigint	Checked
Coasting			bigint	Checked
.
.
.  I used the following command, note that the format is ORC, also can '--num-mappers' cause any duplication ?  sqoop import --num-mappers 8 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FMS_FleetManagementDatawarehouse_VehicleData --hcatalog-table DateVehicleDriverAggregate --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagementDatawarehouse' --username --password --table DateVehicleDriverAggregate -- --schema VehicleData  The Sqoop import job took a long time(5.6h) with the default 4 mappers but the concern is that it imported 1218843487 records, more than the source ! Is the composite key causing some issue or is it something else ?  There were no errors in the job but in case any specific logs are required, I can provide. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Sqoop
			
    
	
		
		
		08-12-2016
	
		
		09:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked :   Did an 'su hive'  Executed the following command(probably, the -- --schema should be the last arg, Sqoop simply ignores/breaks after that!)   sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username  --password  --table DatabaseLog -- --schema ape 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-11-2016
	
		
		07:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-11-2016
	
		
		07:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for the inputs, now I am a bit clear.  I still have a doubt - if importing to Hive and HCatalog is kinda synonymous i.e import in one makes the tables available in the other, why would anyone IMPORT to HCatalog at all, why not import to Hive ONLY ? The only advantage I see is that an HCatalog import gives you an option to store the table in ORC, can you validate my assumption ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-11-2016
	
		
		07:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As I mentioned, the only place where I see the data is under my user's hdfs dir. - neither in HCatalog nor in Hive, hence, can you elaborate on 'relax your permission on the data you sqooped over' ?  I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2016
	
		
		01:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 HDP-2.4.2.0-258 installed using Ambari 2.2.2.0  I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). As suggested by the community, I decided to import in HCatalog.  Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption).  Questions :  
 If imported to Hive directly, will the table be available to Pig, MR ?  If imported to HCatalog, what needs to be done for accessing via Hive ?  Is pre-creation of tables in Hive necessary ? If yes, what is the advantage of the importing in HCatalog, (I can import in Hive directly)/(import in HDFS and then create external table) ?   Issue :  I wish to achieve the following in one step, I'm unsure if it's possible but threads like this and this deem so :  
 Import of the data(from SQL Server tables)  Store this data at custom HDFS path say /org/data/schema1,  /org/data/schema2 and so on(is this impossible because Sqoop says it (--target-dir/--warehouse-dir)  Avoid 'pre-creating' or writing create statements for those tables(there are 100s of them)  Store the table in ORC format   I executed the following command :  -bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password  --table SettingAttribute -- --schema Administration  --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"  The source table contains 109 records and those are fetched :  16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
16/08/10 15:02:41 INFO mapreduce.Job:  map 0% reduce 0%
16/08/10 15:02:47 INFO mapreduce.Job:  map 100% reduce 0%
16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=616636
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=540
                HDFS: Number of bytes written=10079
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=16132
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=16132
                Total vcore-seconds taken by all map tasks=16132
                Total megabyte-seconds taken by all map tasks=66076672
        Map-Reduce Framework
                Map input records=109
                Map output records=109
                Input split bytes=540
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=320
                CPU time spent (ms)=6340
                Physical memory (bytes) snapshot=999870464
                Virtual memory (bytes) snapshot=21872697344
                Total committed heap usage (bytes)=943194112
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=10079
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records.  The files are created under my user :  hdfs dfs -ls /user/ojoqcu/SettingAttribute
Found 5 items
-rw-------   3 ojoqcu hdfs          0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
-rw-------   3 ojoqcu hdfs       8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
-rw-------   3 ojoqcu hdfs        144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
-rw-------   3 ojoqcu hdfs       1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
-rw-------   3 ojoqcu hdfs        434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003  I cannot see anything in HCatalog(nor in Hive)  -bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
WARNING: Use "yarn jar" to launch YARN applications.
16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
OK
Time taken: 2.007 seconds  @Ravi Mutyala I guess you have considerable experience here, I have proceeded based on the inputs you provided earlier 🙂  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache HCatalog
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Sqoop
			
    
	
		
		
		08-10-2016
	
		
		11:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 While I am grateful for your answer, I'm unsure whether I'm stupid or lazy(or both!) 😉  Just in case someone needs in the future :  sqoop list-tables --connect 'jdbc:sqlserver://<hostname/IP>;database=<db-name>' --username <user-name> --password <password> -- --schema <schema-name> 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        












