About pminovic

pminovic · ‎02-01-2016

Hi @Kabirdas B K yes that's possible, you can just add remote nodes like local nodes, and yes you can set rack awareness, but it's not recommended because the performance, for example when running jobs can be very poor. Here is one such experience.

pminovic · ‎01-28-2016

Hi @Arti Wadhwani Yes, that's correct. You can find meaning of all fields here in the "Configuring Ranger Admin Authentication Modes" section. You can select from LDAP, AD, and Unix. You can use LDAP GUI or ldapsearch command to explore LDAP settings and select the right ones. If you have already done for example Ambari setup-ldap you would know how to do it.

pminovic · ‎01-28-2016

@Mehdi TAZI AFAIK, Ozone is a key-object store like AWS S3. Keys/objects are organized into buckets with unique set of keys. Bucket data and Ozone metadata stored in Storage Containers (SC) which coexist with HDFS blocks on Data nodes in a separate block pool. Ozone metadata distributed on SCs, no central NN. Buckets can be huge and are divided into partitions also stored in SCs. R/W supported, append and update not. SC implementaion to use LevelDB or RocksDB. Ozone architecture doc and all details are here. So, it's not on top of HDFS, it's going to coexist with HDFS and share DNs with HDFS.

pminovic · ‎01-27-2016

Hi @Jagdish Saripella Okay, I tried to run your script on my sandbox, and found that you need commas in your "STORE raw_data INTO 'hbase..." command like STORE raw_data INTO 'hbase://test1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_data:firstname,test_data:lastname,test_data:age,test_data:profession'); You also have to pre-create your table, for example from hbase shell: create 'test1', 'test_data'. If you keep the header it will be loaded as well with rowkey='Custno'. Most likely that's not what you want. Hint: Next time when you have troubles with Pig, switch the debug mode on. You can do it by running "SET debug 'on'". That's how I discovered that HBaseStorage is trying to add a column using all that text in brackets without commas. With commas it correctly creates 4 columns.

pminovic · ‎01-25-2016

Hi @Ali Bajwa, thanks for chiming in. No special requirements except that KDC/LDAP run on RHEL Linux. Also, I don't mind wasting more time to install the solution but would like to provide sysadmin with easy-to-use UI to manage users and groups.

pminovic · ‎01-25-2016

Hi @sivasaravanakumar k, for incremental append check-column will be 'id' and you keep on changing last-value for now appends.

pminovic · ‎01-25-2016

Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights: Define your Hive table as external table CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5'; Initially it's empty. Do the first sqoop, importing 5000 entries from MySql table st1 (having id's set to 1-5000), setting target-dir to the location of our external table /user/it1/sqin5 sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id 16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000 16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records. If you check now in Hive, table h2 has 5000 entries. Now append 900 entries to MySql table st1, 5100<=id<6000 and do incremental append import setting last-value to 5000. sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000 16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000 16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999 16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records. If you check now Hive table h2 has 5900 entries. hive> select count(*) from h2; 5900 In the same way you can also handle Sqoop incremental imports into Hive based on "lastmodified" and "merge-key". You can also create a Sqoop job like in your other question and let Sqoop take care of last-value. You can import into Hive local (non-external) tables by setting Sqoop target-dir to /apps/hive/warehouse/<table-name>. [That's what Sqoop does when using "--hive-import"]

pminovic · ‎01-24-2016

@sivasaravanakumar k Attached is the full example, and here are the highlights. Table in Mysql defined below. For best results use timestamp as your date/time field. If you use just "date" like in your table you are ending up with low time granularity, so if you run the same job more than once a day it will import all new records updated that day. create table st1(id int, name varchar(16), ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); Populate the table with 5000 entries. Create and run a new Sqoop job writing into a hdfs directory, please adjust for hbase. I'm showing only the important output lines, see the attachment for full ouput (the "driver" option is required on the sandbox, you can ignore it, and I'm using only 1 mapper because my table is small): [it1@sandbox ~]$ sqoop job --create incjob -- import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --incremental lastmodified -check-column ts --target-dir sqin -m 1 --merge-key id [it1@sandbox ~]$ sqoop job --exec incjob 16/01/24 00:27:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/01/24 00:28:09 INFO tool.ImportTool: Incremental import based on column ts 16/01/24 00:28:09 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:28:09.0' 16/01/24 00:28:31 INFO mapreduce.ImportJobBase: Retrieved 5000 records. 16/01/24 00:28:31 INFO tool.ImportTool: Saving incremental import state to the metastore 16/01/24 00:28:31 INFO tool.ImportTool: Updated data for job: incjob The first time all 5000 entries are imported. Note that import tool sets the "Upper bound value" of ts to the current time when the command is executed. Now, change 200 entries, and run the same job again: [it1@sandbox ~]$ sqoop job --exec incjob 16/01/24 00:35:59 INFO tool.ImportTool: Incremental import based on column ts 16/01/24 00:35:59 INFO tool.ImportTool: Lower bound value: '2016-01-24 00:28:09.0' 16/01/24 00:35:59 INFO tool.ImportTool: Upper bound value: '2016-01-24 00:35:59.0' 16/01/24 00:36:20 INFO mapreduce.ImportJobBase: Retrieved 200 records. 16/01/24 00:36:57 INFO tool.ImportTool: Saving incremental import state to the metastore 16/01/24 00:36:58 INFO tool.ImportTool: Updated data for job: incjob Now only 200 entries are imported. Lower bound value is the one set the first time, and the Upper bound value is updated to the current time, and so the job is ready for the next run. That's all, happy sqooping!

pminovic · ‎01-23-2016

Hi @sivasaravanakumar k If you run sqoop from the command line, without Sqoop job, then you have to add --last-value, try for example to add "--last-value 2016-01-01", then only a few records where Date_Item is in 2016 will be imported. You can actually see that in the output of Sqoop, it gives you exact time when you ran Sqoop. So with --last-value '2016-01-23 13:48:02' nothing will be imported (if your MySql table is unchanged). If you create a new Sqoop job like your "student_info", then Sqoop will keep that date-time for you and you can just run the job again to import updated records.

pminovic · ‎01-23-2016

Yes, we'd like to automate kereberization and provide the customer with an easy-to-use interface to manage users afterwards. I'm in touch and aware of great workshops by @Ali Bajwa but the KDC/OpenLDAP integration is not complete. Also aware of a great post about FreeIPA by @David Streever. And thanks for your super-express repsonse!

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: Deploying hadoop cluster

Re: Ranger UI Login - LDAP integration

Re: Can I use Hbase as a datalake

Re: hbase table data upload fails

Re: Looking for an automated integration of HDP/Am...

Re: sqoop incremental import working fine ,now i w...

Re: sqoop incremental import in hive i get error m...

Re: sqoop incremental import working fine ,now i w...

Re: sqoop incremental import working fine ,now i w...

Re: Looking for an automated integration of HDP/Am...