About pminovic

pminovic · ‎03-03-2016

Hi @saichand varanasi, Sqoop incremental import works either in the append mode or in the lastmodified mode, but you have specified both on your command line. Check here for command specs, and here for a lastomodified Sqoop job sample. Sqoop job will keep track of the "last-value" for you, no need to do that manually.

pminovic · ‎02-29-2016

Hi @Cassandra Spencer, I did this recently, and it was easier than expected: Dump the Ambari database (ambari) into an sql file, and use sed or your favorite editor to change all hostnames in the sql file. Then, run "ambari-server reset", drop ambari DB, create it again, and import the edited sql file into Postgres. Of course take all the baskups beforehand. ambarirca database is not affected. Hive DB has no references to hosts. Oozie DB has refs to old hosts but you can leave them since it's about old jobs. Hue DB, if you use Hue has refs to old hosts, and has to be updated like ambari DB. After restarting Ambari with new hostnames I had no issues. If you can keep old hostnames as aliases there is no need to change ambari agents. Otherwise you need to change ambari agent properties, to redirect agents to the new Ambari server. Re iptables, that's unrelated to the above. You have to ensure that old required ports are opened. Starting with Ambari server and Ambari agent ports, and including all others. Also keep all ephemeral ports opened because ZooKeeper and some other components are using them.

pminovic · ‎02-29-2016

Hi @jbarnett, Some of your paths are wrong. You cannot just replace "current" with "2.2.0.0-2041". Based on my directory structure in 2.2.8 all these paths are non-existent: /usr/hdp/2.2.0.0-2041/hadoop-client/*, /usr/hdp/2.2.0.0-2041/hadoop-client/lib/*, /usr/hdp/2.2.0.0-2041/hadoop-hdfs-client/*, /usr/hdp/2.2.0.0-2041/hadoop-hdfs-client/lib/*, /usr/hdp/2.2.0.0-2041/hadoop-yarn-client/*, /usr/hdp/2.2.0.0-2041/hadoop-yarn-client/lib/*

pminovic · ‎02-28-2016

Hi @Mahesh Deshmukh, check here for one more example of incremental import with merge, including command line output. Note that the column in your DB table you use as "--merge-key" has to be unique. During incrementa import with merge, Sqoop will run 2 MR jobs: The first one is the standard Sqoop Map-only job to import new (updated) records, the second one is the merge MR job. During the merge MR, Sqoop will look for records with the same "merge-key" like the following ones: 150 mary Feb-10 ... already in HDFS 150 john Feb-21 ... newly imported using incremental import and will set this record to the newer one of Feb-21 (assuming the first column is your "merge-key" and the 3rd column is your "check-column").

pminovic · ‎02-27-2016

You can set permissions to your user folder like this hdfs dfs -chmod 700 /user/user1 and if the owner of this folder is user1, then only he can list the folder hdfs dfs -ls /user/user1 ... works for user1 but doesn't work for other users However, hdfs as the HDFS super-user can also list it.

pminovic · ‎02-25-2016

@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.

pminovic · ‎02-24-2016

Hi @bpreachuk, according to this page the bulk load tool doesn't have such a feature, but for smaller files, up to "tens of megabytes" you can use a single threaded psql.py tool which can interpret the first line as a list of columns by using the "-h in-line" option. Thinking about the bulk MR tool it's indeed hard to implement this because every mapper gets a chunk of the file, and we'd like only 1 mapper to remove the very first line, so it will have to be marked in a special way. More details about commands here.

pminovic · ‎02-23-2016

Hi @Sunile Manjee, your screenshot is for the admin user. admin will be always able to see and change them all. For other users you control their access using Ranger -> Settings -> Permissions. If you remove a user from the "Resource Based Policy" list of users he will be able to see a read-only list of policies, but only those in which he was given "Delegate admin" permission (available on each policy to the right of basic permissions), see my screenshot. If he is in the "Resource Based Policy" list he will be presented with a top-level menu like in your screenshot but will be able to interact (edit) only his "Delegate admin" policies. By the way, the above applies to HDP-2.3.4, in earlier versions it might be somewhat different. screen-shot-2016-02-24-at-80537-am.png

pminovic · ‎02-23-2016

You can try to recover some missing blocks by making sure that all your Data nodes and all disks on them are healthy and running. If they are, and you still have missing blocks the only way out is to delete files with missing blocks, either one by one or all of them at once using the "fsck <path> -delete" command. Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor). If after a few days it doesn't, you can trigger the recovery by running the balancer, or as mentioned in another answer run the "setrep" command.

pminovic · ‎02-23-2016

Hi @marko, if I remember correctly Spark-1.6 TP for HDP-2.3.4 was announced 1 day after its GA. Therefore, you can expect Spark-2.0 TP with about the same "delay", and its inclusion in one of the coming versions of HDP soon after. Stay tuned!

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: Sqoop Incremental scenairo

Re: What is the recommended method for changing ho...

Re: Issues with YARN classpath after manual upgrad...

Re: How to use merge in sqoop import

Re: Hide HDFS folder from other AMbari-User

Re: How do you use webhdfs in Java through Knox?

Re: Any option to SKIP header line(s) when using t...

Re: Ranger Admin - Role Seperation

Re: How to fix missing and under replicated blocks...

Re: Hortonworks and Spark 2.0