Support Questions

Find answers, ask questions, and share your expertise

Used Sqoop to import Mysql table (Employees) onto HDP. Now, I would like to create a Hive ORC Employees Table based on it with the as is schema and data on HDP. How can I do it?

avatar
Expert Contributor

My first step was to create an ORC table on Hive with the same schema as the one imported on hdp :

CREATE TABLE IF NOT EXISTS orcemployees( emp_no int, birth_date date,
 first_name string, last_name string, gender string, hire_date date) 
STORED AS ORC;

My second step was to copy the data from employees into orcemployees:

insert into table orcemployees select * from employees;

The problem is when I execute select * from orcemployees; the schema is displayed and not the associated data.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@keerthana gajarajakumar

your error says you don't have permissions for admin user on that folder. You need to

sudo -u hdfs hdfs dfs -chown -R admin:hdfs /tmp/task/employees

OR

sudo -u hdfs hdfs dfs -chmod -R 777 /tmp/task/employees

View solution in original post

23 REPLIES 23

avatar
Master Mentor

Take a look at this example @keerthana gajarajakumar Link

avatar
Expert Contributor

thanks @Artem Ervits

avatar

@keerthana gajarajakumar

You have two options:

  1. Use Sqoop to save directly as ORC file
    $ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
  2. Since you already created the Employee Table then create the ORC table first (employee_orc) and then select from your employee hive table and insert into employee_orc
    create table employee_orc stored as orc as select * from employee_hive;

avatar
Expert Contributor

@Ancil McBarnett - Thanks for your reply. Are you suggesting to do a sqoop import of the employees table onto HDP or Hive?

avatar
Expert Contributor

The reason I'm asking you is , what's the difference between doing a --hive-import of the employees table and then copying that same data into hive orc employees table? I understand that ORC helps in performance improvement but can I see the difference visually when I do a select * from employees vs select * from employees_orc?

avatar
Master Mentor

@keerthana gajarajakumar performance in general is noticeable, orc is the way to go, no questions about it.

avatar

I just realized you had a comment under your original question. I will edit it so that it is the actual question. I was suggesting you can use Sqoop to DIRECTLY create an ORC Table. Try this also:

INSERT OVERWRITE TABLE orcemployees SELECT * FROM employees;

What is your output after you have done so?

avatar
Expert Contributor
@Ancil McBarnett

-- I'm having a bit of a confusion in interpreting your suggestions. Let me know if I have interpreted your answers well.

1. In both the following cases the tables you have referred to as 'employees' and 'employee_hive' is the mysql table imported using sqoop directly onto hive (sqoop import --connect jdbc:mysql://localhost:3306/employees --username xxx--password xxx --table employees --hive-import --driver com.mysql.jdbc.Driver). Am I correct?

2. Once I'm done with importing the employees table on hive mentioned in step 1, I can create an orcemployees table.

Am I right?

avatar

@keerthana gajarajakumar

Let's see if I can make it clear. You asked how can I sqoop data into HDP/ Hive and then create an ORC table from it. I suggested the following:

  1. INSTEAD of Sqoop Data into a table and THEN creating an ORC table you can simply create the ORC table directly with Sqoop - hence the following options in the Sqoop call
    --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
  2. OR, you can use normal Sqoop without the ORC options, create a normal table as you have done and then create an OrcEmployees table.