Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3173 | 12-25-2018 10:42 PM | |
| 14198 | 10-09-2018 03:52 AM | |
| 4764 | 02-23-2018 11:46 PM | |
| 2481 | 09-02-2017 01:49 AM | |
| 2914 | 06-21-2017 12:06 AM |
06-22-2016
11:47 AM
Hi @Laurence Da Luz, thanks for the correction! I'll edit my answer.
... View more
06-22-2016
10:56 AM
2 Kudos
Tez and Slider are also client-only, so HA is not applicable Phoenix: Depends on HBase and ZooKeeper Accumulo: Multiple Accumulo masters can be run, one of them will be active, the rest backup ones Storm: Multiple Nimbus instances supported, automatic failover Falcon: HA is available, but the failover is a manual process, details here Atlas: A backup instance can be run, but the failover is manual (like Falcon). Automated failover expected in version 0.7 Sqoop: metastore backup is usually enough Flume: You can run Flume agents behind a load balancer, more details here Zookeeper: Inherently HA if you run 3 or more instances, furthermore ensure ZK stores data on Raid-10 disks Knox: Multiple instances can be configured behind an LB, more for load balancing but also for HA
... View more
06-22-2016
10:24 AM
2 Kudos
Falcon mirroring is better than distcp because both Hive metadata and data files are mirrored together. With distcp, you would have to mirror Hive metadata separately. A caveat is that all Hive databases and tables existing before Falcon mirroring starts have to be mirrored separately, for example using Hive's export/import table. Also, it's better to mirror whole databases, not individual tables. If you mirror a whole database, then any newly created table on the source cluster will be automatically mirrored (created) on the DR cluster. However, some operations like ACID delete and update are not supported. Also, the mechanism is more complicated: Falcon will schedule Oozie jobs, and they will do mirroring. It's better to run those jobs on the DR cluster to spare resources on the source cluster. And finally, if your clusters use Name Node HA, you will have to configure hdfs on DR cluster to be aware of NN HA setup on the source cluster (this holds also for distcp approach). You can find more details in the Data Governance Guide, sections 2 and 4.
... View more
06-22-2016
07:54 AM
In /etc/yum.repos.d, remove all .repo files pointing to the Internet and copy only .repo files from other servers which are already using your local repo. For HDP nodes, initially you need only 2 .repo files, one for the OS, and ambari.repo. When Ambari adds a new node to the cluster it will copy there HDP.repo and HDP-UTILS.repo. Also, have you set your repository URLs in Ambari-> Admin-> Stack and versions-> Versions -> Manage Versions -> [click on your current version] ?
... View more
06-22-2016
07:28 AM
Glad to hear it worked! Regarding "last-value" it's the best to create a Sqoop job, for example sqoop job --create myjob -- import --connect jdbc:teradata ... and then just execute "sqoop job --exec myjob" every time. Sqoop will memorize last-value in its internal storage. For a "lastmodified" example see the answer to this post.
... View more
06-22-2016
05:22 AM
There appears to be a bug in Sqoop for lastmodified import from Teradata. Regarding "append" import can you try to use an INT column as "check-column".
... View more
06-21-2016
09:51 PM
5 Kudos
You can use hdp-select: hdp-select versions # will show all installed HDP versions, possibly more than one
hdp-select status # will show version of each installed package
... View more
06-21-2016
07:34 AM
Compression and encryption are 2 different things: In Step 1 you compress a file, or not, it's optional but recommended unless you have reasons not to compress it. In Step 2 you encrypt the file from Step 1.
... View more
06-21-2016
07:17 AM
First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op=OPEN&offset=125000&length=8200 Check this nicely written blog for a full example and a PHP script which can automate the process.
... View more
06-21-2016
06:15 AM
1 Kudo
Stop ambari-server and all ambari-agents Backup Ambari DB Shutdown Ambari DB and perform maintenance Start Ambari DB Start ambari-server and all ambari-agents, make sure everything works fine on Ambari dashboard and other pages, if something is wrong stop Ambari and restore the database backup. You can use the cluster as usual during the above operations but obviously you cannot use Ambari to manage the cluster, and all Ambari alerts will be suppresed.
... View more