Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2626 | 12-25-2018 10:42 PM | |
12060 | 10-09-2018 03:52 AM | |
4164 | 02-23-2018 11:46 PM | |
1839 | 09-02-2017 01:49 AM | |
2166 | 06-21-2017 12:06 AM |
02-01-2016
10:04 AM
2 Kudos
Hi @Kabirdas B K yes that's possible, you can just add remote nodes like local nodes, and yes you can set rack awareness, but it's not recommended because the performance, for example when running jobs can be very poor. Here is one such experience.
... View more
01-28-2016
11:40 AM
1 Kudo
Hi @Arti Wadhwani Yes, that's correct. You can find meaning of all fields here in the "Configuring Ranger Admin Authentication Modes" section. You can select from LDAP, AD, and Unix. You can use LDAP GUI or ldapsearch command to explore LDAP settings and select the right ones. If you have already done for example Ambari setup-ldap you would know how to do it.
... View more
01-28-2016
02:11 AM
1 Kudo
@Mehdi TAZI AFAIK, Ozone is a key-object store like AWS S3. Keys/objects are organized into buckets with unique set of keys. Bucket data and Ozone metadata stored in Storage Containers (SC) which coexist with HDFS blocks on Data nodes in a separate block pool. Ozone metadata distributed on SCs, no central NN. Buckets can be huge and are divided into partitions also stored in SCs. R/W supported, append and update not. SC implementaion to use LevelDB or RocksDB. Ozone architecture doc and all details are here. So, it's not on top of HDFS, it's going to coexist with HDFS and share DNs with HDFS.
... View more
01-27-2016
06:51 AM
1 Kudo
Hi @Jagdish Saripella Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession'); You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want. Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.
... View more
01-25-2016
02:26 PM
Hi @Ali Bajwa, thanks for chiming in. No special requirements except that KDC/LDAP run on RHEL Linux. Also, I don't mind wasting more time to install the solution but would like to provide sysadmin with easy-to-use UI to manage users and groups.
... View more
01-25-2016
02:23 PM
Hi @sivasaravanakumar k, for incremental append check-column will be 'id' and you keep on changing last-value for now appends.
... View more
01-25-2016
02:17 PM
3 Kudos
Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights:
Define your Hive table as external table CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
Initially it's empty. Do the first sqoop, importing 5000 entries from MySql table st1 (having id's set to 1-5000), setting target-dir to the location of our external table /user/it1/sqin5 sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id
16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000
16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
If you check now in Hive, table h2 has 5000 entries. Now append 900 entries to MySql table st1, 5100<=id<6000 and do incremental append import setting last-value to 5000. sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000
16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000
16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999
16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records.
If you check now Hive table h2 has 5900 entries. hive> select count(*) from h2;
5900
In the same way you can also handle Sqoop incremental imports into Hive based on "lastmodified" and "merge-key".
You can also create a Sqoop job like in your other question and let Sqoop take care of last-value. You can import into Hive local (non-external) tables by setting Sqoop target-dir to /apps/hive/warehouse/<table-name>. [That's what Sqoop does when using "--hive-import"]
... View more
01-24-2016
01:28 AM
2 Kudos
@sivasaravanakumar k Attached is the full example, and here are the highlights. Table in Mysql defined below. For best results use timestamp as your date/time field. If you use just "date" like in your table you are ending up with low time granularity, so if you run the same job more than once a day it will import all new records updated that day. create table st1(id int, name varchar(16), ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); Populate the table with 5000 entries. Create and run a new Sqoop job writing into a hdfs directory, please adjust for hbase. I'm showing only the important output lines, see the attachment for full ouput (the "driver" option is required on the sandbox, you can ignore it, and I'm using only 1 mapper because my table is small): [it1@sandbox ~]$ sqoop job --create incjob -- import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --incremental lastmodified -check-column ts --target-dir sqin -m 1 --merge-key id
[it1@sandbox ~]$ sqoop job --exec incjob
16/01/24 00:27:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950
16/01/24 00:28:09 INFO tool.ImportTool: Incremental import based on column ts
16/01/24 00:28:09 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:28:09.0'
16/01/24 00:28:31 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
16/01/24 00:28:31 INFO tool.ImportTool: Saving incremental import state to the metastore
16/01/24 00:28:31 INFO tool.ImportTool: Updated data for job: incjob
The first time all 5000 entries are imported. Note that import tool sets the "Upper bound value" of ts to the current time when the command is executed. Now, change 200 entries, and run the same job again: [it1@sandbox ~]$ sqoop job --exec incjob
16/01/24 00:35:59 INFO tool.ImportTool: Incremental import based on column ts
16/01/24 00:35:59 INFO tool.ImportTool: Lower bound value: '2016-01-24 00:28:09.0'
16/01/24 00:35:59 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:35:59.0'
16/01/24 00:36:20 INFO mapreduce.ImportJobBase: Retrieved 200 records.
16/01/24 00:36:57 INFO tool.ImportTool: Saving incremental import state to the metastore
16/01/24 00:36:58 INFO tool.ImportTool: Updated data for job: incjob
Now only 200 entries are imported. Lower bound value is the one set the first time, and the Upper bound value is updated to the current time, and so the job is ready for the next run. That's all, happy sqooping!
... View more
01-23-2016
09:36 AM
Hi @sivasaravanakumar k If you run sqoop from the command line, without Sqoop job, then you have to add --last-value, try for example to add "--last-value 2016-01-01", then only a few records where Date_Item is in 2016 will be imported. You can actually see that in the output of Sqoop, it gives you exact time when you ran Sqoop. So with --last-value '2016-01-23 13:48:02' nothing will be imported (if your MySql table is unchanged). If you create a new Sqoop job like your "student_info", then Sqoop will keep that date-time for you and you can just run the job again to import updated records.
... View more
01-23-2016
12:40 AM
1 Kudo
Yes, we'd like to automate kereberization and provide the customer with an easy-to-use interface to manage users afterwards. I'm in touch and aware of great workshops by @Ali Bajwa but the KDC/OpenLDAP integration is not complete. Also aware of a great post about FreeIPA by @David Streever. And thanks for your super-express repsonse!
... View more