Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Used Sqoop to import Mysql table (Employees) onto HDP. Now, I would like to create a Hive ORC Employees Table based on it with the as is schema and data on HDP. How can I do it?

avatar
Expert Contributor

My first step was to create an ORC table on Hive with the same schema as the one imported on hdp :

CREATE TABLE IF NOT EXISTS orcemployees( emp_no int, birth_date date,
 first_name string, last_name string, gender string, hire_date date) 
STORED AS ORC;

My second step was to copy the data from employees into orcemployees:

insert into table orcemployees select * from employees;

The problem is when I execute select * from orcemployees; the schema is displayed and not the associated data.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@keerthana gajarajakumar

your error says you don't have permissions for admin user on that folder. You need to

sudo -u hdfs hdfs dfs -chown -R admin:hdfs /tmp/task/employees

OR

sudo -u hdfs hdfs dfs -chmod -R 777 /tmp/task/employees

View solution in original post

23 REPLIES 23

avatar
Master Mentor

Take a look at this example @keerthana gajarajakumar Link

avatar
Expert Contributor

thanks @Artem Ervits

avatar

@keerthana gajarajakumar

You have two options:

  1. Use Sqoop to save directly as ORC file
    $ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
  2. Since you already created the Employee Table then create the ORC table first (employee_orc) and then select from your employee hive table and insert into employee_orc
    create table employee_orc stored as orc as select * from employee_hive;

avatar
Expert Contributor

@Ancil McBarnett - Thanks for your reply. Are you suggesting to do a sqoop import of the employees table onto HDP or Hive?

avatar
Expert Contributor

The reason I'm asking you is , what's the difference between doing a --hive-import of the employees table and then copying that same data into hive orc employees table? I understand that ORC helps in performance improvement but can I see the difference visually when I do a select * from employees vs select * from employees_orc?

avatar
Master Mentor

@keerthana gajarajakumar performance in general is noticeable, orc is the way to go, no questions about it.

avatar

I just realized you had a comment under your original question. I will edit it so that it is the actual question. I was suggesting you can use Sqoop to DIRECTLY create an ORC Table. Try this also:

INSERT OVERWRITE TABLE orcemployees SELECT * FROM employees;

What is your output after you have done so?

avatar
Expert Contributor
@Ancil McBarnett

-- I'm having a bit of a confusion in interpreting your suggestions. Let me know if I have interpreted your answers well.

1. In both the following cases the tables you have referred to as 'employees' and 'employee_hive' is the mysql table imported using sqoop directly onto hive (sqoop import --connect jdbc:mysql://localhost:3306/employees --username xxx--password xxx --table employees --hive-import --driver com.mysql.jdbc.Driver). Am I correct?

2. Once I'm done with importing the employees table on hive mentioned in step 1, I can create an orcemployees table.

Am I right?

avatar

@keerthana gajarajakumar

Let's see if I can make it clear. You asked how can I sqoop data into HDP/ Hive and then create an ORC table from it. I suggested the following:

  1. INSTEAD of Sqoop Data into a table and THEN creating an ORC table you can simply create the ORC table directly with Sqoop - hence the following options in the Sqoop call
    --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
  2. OR, you can use normal Sqoop without the ORC options, create a normal table as you have done and then create an OrcEmployees table.