About pminovic

pminovic · ‎06-22-2016

Hi @Laurence Da Luz, thanks for the correction! I'll edit my answer.

pminovic · ‎06-22-2016

Tez and Slider are also client-only, so HA is not applicable Phoenix: Depends on HBase and ZooKeeper Accumulo: Multiple Accumulo masters can be run, one of them will be active, the rest backup ones Storm: Multiple Nimbus instances supported, automatic failover Falcon: HA is available, but the failover is a manual process, details here Atlas: A backup instance can be run, but the failover is manual (like Falcon). Automated failover expected in version 0.7 Sqoop: metastore backup is usually enough Flume: You can run Flume agents behind a load balancer, more details here Zookeeper: Inherently HA if you run 3 or more instances, furthermore ensure ZK stores data on Raid-10 disks Knox: Multiple instances can be configured behind an LB, more for load balancing but also for HA

pminovic · ‎06-22-2016

Falcon mirroring is better than distcp because both Hive metadata and data files are mirrored together. With distcp, you would have to mirror Hive metadata separately. A caveat is that all Hive databases and tables existing before Falcon mirroring starts have to be mirrored separately, for example using Hive's export/import table. Also, it's better to mirror whole databases, not individual tables. If you mirror a whole database, then any newly created table on the source cluster will be automatically mirrored (created) on the DR cluster. However, some operations like ACID delete and update are not supported. Also, the mechanism is more complicated: Falcon will schedule Oozie jobs, and they will do mirroring. It's better to run those jobs on the DR cluster to spare resources on the source cluster. And finally, if your clusters use Name Node HA, you will have to configure hdfs on DR cluster to be aware of NN HA setup on the source cluster (this holds also for distcp approach). You can find more details in the Data Governance Guide, sections 2 and 4.

pminovic · ‎06-22-2016

In /etc/yum.repos.d, remove all .repo files pointing to the Internet and copy only .repo files from other servers which are already using your local repo. For HDP nodes, initially you need only 2 .repo files, one for the OS, and ambari.repo. When Ambari adds a new node to the cluster it will copy there HDP.repo and HDP-UTILS.repo. Also, have you set your repository URLs in Ambari-> Admin-> Stack and versions-> Versions -> Manage Versions -> [click on your current version] ?

pminovic · ‎06-22-2016

Glad to hear it worked! Regarding "last-value" it's the best to create a Sqoop job, for example sqoop job --create myjob -- import --connect jdbc:teradata ... and then just execute "sqoop job --exec myjob" every time. Sqoop will memorize last-value in its internal storage. For a "lastmodified" example see the answer to this post.

pminovic · ‎06-22-2016

There appears to be a bug in Sqoop for lastmodified import from Teradata. Regarding "append" import can you try to use an INT column as "check-column".

pminovic · ‎06-21-2016

You can use hdp-select: hdp-select versions # will show all installed HDP versions, possibly more than one hdp-select status # will show version of each installed package

pminovic · ‎06-21-2016

Compression and encryption are 2 different things: In Step 1 you compress a file, or not, it's optional but recommended unless you have reasons not to compress it. In Step 2 you encrypt the file from Step 1.

pminovic · ‎06-21-2016

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op=OPEN&offset=125000&length=8200 Check this nicely written blog for a full example and a PHP script which can automate the process.

pminovic · ‎06-21-2016

Stop ambari-server and all ambari-agents Backup Ambari DB Shutdown Ambari DB and perform maintenance Start Ambari DB Start ambari-server and all ambari-agents, make sure everything works fine on Ambari dashboard and other pages, if something is wrong stop Ambari and restore the database backup. You can use the cluster as usual during the above operations but obviously you cannot use Ambari to manage the cluster, and all Ambari alerts will be suppresed.

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: HDP services High Availability

Re: HDP services High Availability

Re: Hive replication between clusters - Falcon bas...

Re: How to modify the repo when add a new host

Re: sqoop incremental load failed with timestamp

Re: sqoop incremental load failed with timestamp

Re: How to I find the version of HDP installed on...

Re: Between HDFS encryption and compression

Re: Downloading a file inside a hadoop archive usi...

Re: ambari database maintenance