Member since
06-28-2021
5
Posts
1
Kudos Received
0
Solutions
07-03-2021
02:39 PM
@harsh8 Happy to help with that question. The simple answer is YES below I am demonstrating with the table staff created in the previous post! Before the import [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop When you run the sqoop import to ensure the destination directory sqoop_harsh8 doesn't already exist in HDFS $ sqoop import --connect jdbc:mysql://localhost/harsh8 --table staff --username root -P --target-dir /tmp/sqoop_harsh8 -m 1 Here I am importing the table harsh8.staff I created in the previous session. The sqoop export will create 2 files _SUCCESS and part-m-0000 in the HDFS directory as shown below. After the export the directory /tmp/sqoop_harsh8 is newly created [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop
-rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Check the contents of /tmp/sqoop_harsh8 [hdfs@bern ~]$ hdfs dfs -ls /tmp/sqoop_harsh8
Found 2 items
-rw-r--r-- 3 hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:04 /tmp/sqoop_harsh8/part-m-00000 The _SUCCESS is just a log file so cat the contents of part-m-00000 this is the data from our table harsh8.staff [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000
100,Geoffrey,manager,50000,Admin
101,Thomas,Oracle Consultant,15000,IT
102,Biden,Project Manager,28000,PM
103,Carmicheal,Bigdata developer,30000,BDS
104,Johnson,Treasurer,21000,Accounts
105,Gerald,Director,30000,Management I piped the contents to a text file hr2.txt in my local tmp directory to enable me run a sqoop import with an acceptable format [hdfs@bern ~]$ hdfs dfs -cat /tmp/sqoop_harsh8/part-m-00000 > /tmp/hr2.txt Validate the hr2.txt contents [hdfs@bern ~]$ cat /tmp/hr2.txt
100,Geoffrey,manager,50000,Admin
101,Thomas,Oracle Consultant,15000,IT
102,Biden,Project Manager,28000,PM
103,Carmicheal,Bigdata developer,30000,BDS
104,Johnson,Treasurer,21000,Accounts
105,Gerald,Director,30000,Management Copied the hr2.txt to HDFS and validated the file was copied [hdfs@bern ~]$ hdfs dfs -copyFromLocal /tmp/hr2.txt /tmp Validation [hdfs@bern ~]$ hdfs dfs -ls /tmp
Found 7 items
drwxrwxr-x - druid hadoop 0 2020-07-06 02:04 /tmp/druid-indexing
drwxr-xr-x - hdfs hdfs 0 2020-07-06 01:50 /tmp/entity-file-history
drwx-wx-wx - hive hdfs 0 2020-07-06 01:59 /tmp/hive
-rw-r--r-- 3 hdfs hdfs 223 2021-07-03 22:41 /tmp/hr2.txt
-rw-r--r-- 3 hdfs hdfs 1024 2020-07-06 01:57 /tmp/ida8c04300_date570620
drwxr-xr-x - hdfs hdfs 0 2021-06-29 10:14 /tmp/sqoop
drwxr-xr-x - hdfs hdfs 0 2021-07-03 22:04 /tmp/sqoop_harsh8 Connected to MySQL and switch to harsh8 database [root@bern ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 179
Server version: 5.5.65-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use harsh8;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed Check the existing tables in the harsh8 database before the export MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
+------------------+
1 row in set (0.00 sec) Pre-create a table staff2 to receive the hr2.txt data MariaDB [harsh8]> CREATE TABLE staff2 ( id INT NOT NULL PRIMARY KEY, Name VARCHAR(20), Position VARCHAR(20),Salary INT,Department VARCHAR(10));
Query OK, 0 rows affected (0.57 sec)
MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
| staff2 |
+------------------+
2 rows in set (0.00 sec) Load data into staff2 from a sqoop export ! sqoop]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt
[hdfs@bern ~]$ sqoop export --connect jdbc:mysql://localhost/harsh8 --username root --password 'w3lc0m31' --table staff2 --export-dir /tmp/hr2.txt
...
21/07/03 22:44:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.1.4.0-315
21/07/03 22:44:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/07/03 22:44:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/07/03 22:44:37 INFO tool.CodeGenTool: Beginning code generation
21/07/03 22:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1
21/07/03 22:44:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff2` AS t LIMIT 1
21/07/03 22:44:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/3.1.4.0-315/hadoop-mapreduce
21/07/03 22:45:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/a53bb813b88ab155201196658f3ee001/staff2.jar
21/07/03 22:45:48 INFO mapreduce.ExportJobBase: Beginning export of staff2
21/07/03 22:47:59 INFO client.RMProxy: Connecting to ResourceManager at bern.swiss.ch/192.168.0.139:8050
21/07/03 22:48:07 INFO client.AHSProxy: Connecting to Application History server at bern.swiss.ch/192.168.0.139:10200
21/07/03 22:48:18 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1625340048377_0003
21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1
21/07/03 22:49:05 INFO input.FileInputFormat: Total input files to process : 1
21/07/03 22:49:12 INFO mapreduce.JobSubmitter: number of splits:4
21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1625340048377_0003
21/07/03 22:49:24 INFO mapreduce.JobSubmitter: Executing with tokens: []
21/07/03 22:49:26 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
21/07/03 22:49:32 INFO impl.YarnClientImpl: Submitted application application_1625340048377_0003
21/07/03 22:49:33 INFO mapreduce.Job: The url to track the job: http://bern.swiss.ch:8088/proxy/application_1625340048377_0003/
21/07/03 22:49:33 INFO mapreduce.Job: Running job: job_1625340048377_0003
21/07/03 22:52:15 INFO mapreduce.Job: Job job_1625340048377_0003 running in uber mode : false
21/07/03 22:52:15 INFO mapreduce.Job: map 0% reduce 0%
21/07/03 22:56:45 INFO mapreduce.Job: map 75% reduce 0%
21/07/03 22:58:10 INFO mapreduce.Job: map 100% reduce 0%
21/07/03 22:58:13 INFO mapreduce.Job: Job job_1625340048377_0003 completed successfully
21/07/03 22:58:14 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=971832
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1132
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=1733674
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=866837
Total vcore-milliseconds taken by all map tasks=866837
Total megabyte-milliseconds taken by all map tasks=1775282176
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=526
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1565
CPU time spent (ms)=6710
Physical memory (bytes) snapshot=661999616
Virtual memory (bytes) snapshot=12958916608
Total committed heap usage (bytes)=462422016
Peak Map Physical memory (bytes)=202506240
Peak Map Virtual memory (bytes)=3244965888
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Transferred 1.1055 KB in 630.3928 seconds (1.7957 bytes/sec)
21/07/03 22:58:14 INFO mapreduce.ExportJobBase: Exported 6 records. Switch and log onto MariaDB, switch to harsh8 database, and query for the new table staff2 [root@bern ~]# mysql -uroot -pwelcome1
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 266
Server version: 5.5.65-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use harsh8;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [harsh8]> show tables;
+------------------+
| Tables_in_harsh8 |
+------------------+
| staff |
| staff2 |
+------------------+
2 rows in set (2.64 sec)
MariaDB [harsh8]> SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2021-07-03 23:02:38 |
+---------------------+
1 row in set (0.00 sec) After the import MariaDB [harsh8]> select * from staff2;
+-----+------------+-------------------+--------+------------+
| id | Name | Position | Salary | Department |
+-----+------------+-------------------+--------+------------+
| 100 | Geoffrey | manager | 50000 | Admin |
| 101 | Thomas | Oracle Consultant | 15000 | IT |
| 102 | Biden | Project Manager | 28000 | PM |
| 103 | Carmicheal | Bigdata developer | 30000 | BDS |
| 104 | Johnson | Treasurer | 21000 | Accounts |
| 105 | Gerald | Director | 30000 | Management |
+-----+------------+-------------------+--------+------------+
6 rows in set (0.00 sec) Check the source table used in the export see the timestamp ! MariaDB [harsh8]> SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2021-07-03 23:04:50 |
+---------------------+
1 row in set (0.00 sec) Comparison MariaDB [harsh8]> select * from staff;
+-----+------------+-------------------+--------+------------+
| id | Name | Position | Salary | Department |
+-----+------------+-------------------+--------+------------+
| 100 | Geoffrey | manager | 50000 | Admin |
| 101 | Thomas | Oracle Consultant | 15000 | IT |
| 102 | Biden | Project Manager | 28000 | PM |
| 103 | Carmicheal | Bigdata developer | 30000 | BDS |
| 104 | Johnson | Treasurer | 21000 | Accounts |
| 105 | Gerald | Director | 30000 | Management |
+-----+------------+-------------------+--------+------------+ You have successfully created a table from a Sqoop export! Et Voila The conversion from part-m-00000 to txt did the trick, so this proves it's doable so you question is answered 🙂 You can revalidate by following my steps Happy hadooping !
... View more
06-28-2021
10:39 AM
1 Kudo
Sure Vidya , Thank you
... View more