About pminovic

pminovic · ‎02-07-2017

Hi @Ed R, There is no such Hive syntax without a UDF, but there are some such UDFs available, and you can try to use them as is, or modify them if needed, like for example "to_map" described here. This is also assuming that your query is actually "select id1, id2, to_map(col1_key, col2_val) as result from <table> order by id1, id2" where <table> can either be a single table, or a union as you mentioned, and to_map is actaully a UDAF or User defined aggregate function. In this case you may want to decide how to handle cases where the same key points to different values. And finally, there is nothing misterious with Hive UDFs or UDAFs, you just implement some interfaces or overwrite some functions, and you have these examples to speed you up. As you go deeper into Hive, you will realize sooner or later that UDFs are the must.

pminovic · ‎02-07-2017

No, not each job, you must be careful how do you, as I said "address Oozie". If you run Oozie commands from the command line you need to use the ozlb address as your Oozie server, or export OOZIE_URL=ozlb. And if you use Falcon, then when you define a logical cluster you need to use ozlb as your Oozie server address.

pminovic · ‎02-07-2017

As a workaround, you can create an external table on /user/test1/csvfolder, and then insert records from this table into your internal table using "INSERT INTO TABLE tbl SELECT * FROM tbl_ext; However, creating external table requires write permission on your csvfolder, although the files there will be left intact.

pminovic · ‎02-07-2017

First of all, you should not use "localhost" anywhere in your settings, but FQDNs. Let's call your Oozie HA servers oz1 and oz2, and your LB ozlb (these are all FQDNs). If you keep addressing Oozie by let's say oz1, then if oz1 goes down you lose Oozie functionality. That's the main reason to do HA, and the main reason to set OOZIE_BASE_URL=ozlb, because the LB can detect a failed Oozie server, and redirect the traffic to the other one. Of course, the LB is also doing load balancing, when both oz1 and oz2 are healthy (jobs on demand). HTH.

pminovic · ‎02-06-2017

Hi @jean rivera, yes, oozie servers "share" the scheduling by connecting to the same Oozie database. Scheduled jobs are executed only once, the server to run a job is selected randomly, and coordination is done using Zookeeper distributed locks. Regarding oozie_base_url, set it to the URL of your load balancer. More details about Oozie active-active architecture here. And finally, in order to use Oozie HA you must replace the default derby database shipped with Oozie with a "real" one, which supports concurrent queries, like MySql or postgres.

pminovic · ‎02-06-2017

You can safely ignore this. Your app first tried to contact rm1 and found it to be in the Stand-By mode: "WARN ipc.Client: Failed to connect to server: str20/10.5.168.121:8032:". After that it failed over to rm2: "INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2". If rm1 is Active, there will be no such message.

pminovic · ‎02-06-2017

Hi Leo, it seems that error can happen because of some miscommunication between the RM, an app's AM and one or more NMs. One scenario is described in YARN-3535, in particular in this post. How to fix it depends on what are you doing. If it's after an HDP upgrade you can try to clear affected NM recovery directories, and restart Yarn. If it happens on a particular app, then something might be wrong with app, like some configs missing, etc.

pminovic · ‎02-01-2017

As the Jira says, stmk is rarely used, but the current version of Zookeeper, ver. 3.4.6 was added in HDP-2.x, so yes, 3.4.7 or higher which fixes the bug should be added soon.

pminovic · ‎01-29-2017

You also need your OS local repo, at least the binaries on so-called DVD-1. ambari-server requires postgres which comes from OS repo. There is no way to avoid this with yum. You can configure ambari-server to use another database during "ambari-server setup" step. You can install only ambari-server without postgres using "rpm -ivh -nodeps ambari-server***.rpm" but it's not recommended because there are other dependencies to check like python, and even if all that works you will need OS local repo to install HDP.

pminovic · ‎12-19-2016

The validate option works only on tables in HDFS, not on those in Hive and HBase. It works both with "import" and "export" Sqoop commands. The default validation class is org.apache.sqoop.validation.RowCountValidator which compares the number of rows in the source and destination tables. You can customize it by providing your own validation class, which must implement the org.apache.sqoop.validation.Validator interface. Check also the response to the same question asked before, and Sqoop documentation, chapter 11.

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: Collapse rows into a map in Hive

Re: understanding oozie active-active architecture

Re: Hive user cannot read from HDFS on "load data ...

Re: understanding oozie active-active architecture

Re: understanding oozie active-active architecture

Re: After HDP 2.4.2 to 2.5.3 upgrade, standby reso...

Re: ERROR: Container complete event for unknown co...

Re: ZooKeeper - Set Trace mask

Re: Ambari Local Repo Install requires posgres rep...

Re: What exactly sqoop will validate with --valida...