Created 02-01-2016 10:33 PM
My first step was to create an ORC table on Hive with the same schema as the one imported on hdp :
CREATE TABLE IF NOT EXISTS orcemployees( emp_no int, birth_date date, first_name string, last_name string, gender string, hire_date date) STORED AS ORC;
My second step was to copy the data from employees into orcemployees:
insert into table orcemployees select * from employees;
The problem is when I execute select * from orcemployees; the schema is displayed and not the associated data.
Created 02-02-2016 03:13 PM
your error says you don't have permissions for admin user on that folder. You need to
sudo -u hdfs hdfs dfs -chown -R admin:hdfs /tmp/task/employees
OR
sudo -u hdfs hdfs dfs -chmod -R 777 /tmp/task/employees
Created 02-01-2016 10:42 PM
Take a look at this example @keerthana gajarajakumar Link
Created 02-01-2016 10:50 PM
thanks @Artem Ervits
Created 02-01-2016 10:42 PM
You have two options:
$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
create table employee_orc stored as orc as select * from employee_hive;
Created 02-01-2016 10:49 PM
@Ancil McBarnett - Thanks for your reply. Are you suggesting to do a sqoop import of the employees table onto HDP or Hive?
Created 02-01-2016 10:55 PM
The reason I'm asking you is , what's the difference between doing a --hive-import of the employees table and then copying that same data into hive orc employees table? I understand that ORC helps in performance improvement but can I see the difference visually when I do a select * from employees vs select * from employees_orc?
Created 02-01-2016 10:57 PM
@keerthana gajarajakumar performance in general is noticeable, orc is the way to go, no questions about it.
Created 02-01-2016 10:59 PM
I just realized you had a comment under your original question. I will edit it so that it is the actual question. I was suggesting you can use Sqoop to DIRECTLY create an ORC Table. Try this also:
INSERT OVERWRITE TABLE orcemployees SELECT * FROM employees;
What is your output after you have done so?
Created 02-01-2016 11:16 PM
-- I'm having a bit of a confusion in interpreting your suggestions. Let me know if I have interpreted your answers well.
1. In both the following cases the tables you have referred to as 'employees' and 'employee_hive' is the mysql table imported using sqoop directly onto hive (sqoop import --connect jdbc:mysql://localhost:3306/employees --username xxx--password xxx --table employees --hive-import --driver com.mysql.jdbc.Driver). Am I correct?
2. Once I'm done with importing the employees table on hive mentioned in step 1, I can create an orcemployees table.
Am I right?
Created 02-01-2016 11:41 PM
Let's see if I can make it clear. You asked how can I sqoop data into HDP/ Hive and then create an ORC table from it. I suggested the following:
--hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"