Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2633 | 12-25-2018 10:42 PM | |
12092 | 10-09-2018 03:52 AM | |
4174 | 02-23-2018 11:46 PM | |
1859 | 09-02-2017 01:49 AM | |
2177 | 06-21-2017 12:06 AM |
03-03-2016
02:22 AM
1 Kudo
Hi @saichand varanasi, Sqoop incremental import works either in the append mode or in the lastmodified mode, but you have specified both on your command line. Check here for command specs, and here for a lastomodified Sqoop job sample. Sqoop job will keep track of the "last-value" for you, no need to do that manually.
... View more
02-29-2016
03:15 PM
5 Kudos
Hi @Cassandra Spencer, I did this recently, and it was easier than expected: Dump the Ambari database (ambari) into an sql file, and use sed or your favorite editor to change all hostnames in the sql file. Then, run "ambari-server reset", drop ambari DB, create it again, and import the edited sql file into Postgres. Of course take all the baskups beforehand. ambarirca database is not affected. Hive DB has no references to hosts. Oozie DB has refs to old hosts but you can leave them since it's about old jobs. Hue DB, if you use Hue has refs to old hosts, and has to be updated like ambari DB. After restarting Ambari with new hostnames I had no issues. If you can keep old hostnames as aliases there is no need to change ambari agents. Otherwise you need to change ambari agent properties, to redirect agents to the new Ambari server. Re iptables, that's unrelated to the above. You have to ensure that old required ports are opened. Starting with Ambari server and Ambari agent ports, and including all others. Also keep all ephemeral ports opened because ZooKeeper and some other components are using them.
... View more
02-29-2016
02:16 AM
1 Kudo
Hi @jbarnett, Some of your paths are wrong. You cannot just replace "current" with "2.2.0.0-2041". Based on my directory structure in 2.2.8 all these paths are non-existent: /usr/hdp/2.2.0.0-2041/hadoop-client/*,
/usr/hdp/2.2.0.0-2041/hadoop-client/lib/*,
/usr/hdp/2.2.0.0-2041/hadoop-hdfs-client/*,
/usr/hdp/2.2.0.0-2041/hadoop-hdfs-client/lib/*,
/usr/hdp/2.2.0.0-2041/hadoop-yarn-client/*,
/usr/hdp/2.2.0.0-2041/hadoop-yarn-client/lib/*
... View more
02-28-2016
10:18 PM
Hi @Mahesh Deshmukh, check here for one more example of incremental import with merge, including command line output. Note that the column in your DB table you use as "--merge-key" has to be unique. During incrementa import with merge, Sqoop will run 2 MR jobs: The first one is the standard Sqoop Map-only job to import new (updated) records, the second one is the merge MR job. During the merge MR, Sqoop will look for records with the same "merge-key" like the following ones: 150 mary Feb-10 ... already in HDFS
150 john Feb-21 ... newly imported using incremental import and will set this record to the newer one of Feb-21 (assuming the first column is your "merge-key" and the 3rd column is your "check-column").
... View more
02-27-2016
04:09 AM
1 Kudo
You can set permissions to your user folder like this hdfs dfs -chmod 700 /user/user1 and if the owner of this folder is user1, then only he can list the folder hdfs dfs -ls /user/user1 ... works for user1 but doesn't work for other users However, hdfs as the HDFS super-user can also list it.
... View more
02-25-2016
11:40 PM
1 Kudo
@Kit Menke check this for an interactive Java Knox DSL shell, where you can test your approach and later compile the parts you need.
... View more
02-24-2016
10:52 PM
2 Kudos
Hi @bpreachuk, according to this page the bulk load tool doesn't have such a feature, but for smaller files, up to "tens of megabytes" you can use a single threaded psql.py tool which can interpret the first line as a list of columns by using the "-h in-line" option. Thinking about the bulk MR tool it's indeed hard to implement this because every mapper gets a chunk of the file, and we'd like only 1 mapper to remove the very first line, so it will have to be marked in a special way. More details about commands here.
... View more
02-23-2016
11:11 PM
Hi @Sunile Manjee, your screenshot is for the admin user. admin will be always able to see and change them all. For other users you control their access using Ranger -> Settings -> Permissions. If you remove a user from the "Resource Based Policy" list of users he will be able to see a read-only list of policies, but only those in which he was given "Delegate admin" permission (available on each policy to the right of basic permissions), see my screenshot. If he is in the "Resource Based Policy" list he will be presented with a top-level menu like in your screenshot but will be able to interact (edit) only his "Delegate admin" policies. By the way, the above applies to HDP-2.3.4, in earlier versions it might be somewhat different. screen-shot-2016-02-24-at-80537-am.png
... View more
02-23-2016
10:03 AM
2 Kudos
You can try to recover some missing blocks by making sure that all your Data nodes and all disks on them
are healthy and running. If they are, and you still have missing blocks the only way out is to delete files with missing blocks,
either one by one or all of them at once using the "fsck <path> -delete" command. Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor).
If after a few days it doesn't, you can trigger the recovery by running the balancer, or as mentioned in another answer run the "setrep" command.
... View more
02-23-2016
09:31 AM
4 Kudos
Hi @marko, if I remember correctly Spark-1.6 TP for HDP-2.3.4 was announced 1 day after its GA. Therefore, you can expect Spark-2.0 TP with about the same "delay", and its inclusion in one of the coming versions of HDP soon after. Stay tuned!
... View more