Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2663 | 12-25-2018 10:42 PM | |
12195 | 10-09-2018 03:52 AM | |
4200 | 02-23-2018 11:46 PM | |
1888 | 09-02-2017 01:49 AM | |
2208 | 06-21-2017 12:06 AM |
02-07-2017
10:16 PM
Hi @Ed R, There is no such Hive syntax without a UDF, but there are some such UDFs available, and you can try to use them as is, or modify them if needed, like for example "to_map" described here. This is also assuming that your query is actually "select id1, id2, to_map(col1_key, col2_val) as result from <table> order by id1, id2" where <table> can either be a single table, or a union as you mentioned, and to_map is actaully a UDAF or User defined aggregate function. In this case you may want to decide how to handle cases where the same key points to different values. And finally, there is nothing misterious with Hive UDFs or UDAFs, you just implement some interfaces or overwrite some functions, and you have these examples to speed you up. As you go deeper into Hive, you will realize sooner or later that UDFs are the must.
... View more
02-07-2017
10:04 PM
No, not each job, you must be careful how do you, as I said "address Oozie". If you run Oozie commands from the command line you need to use the ozlb address as your Oozie server, or export OOZIE_URL=ozlb. And if you use Falcon, then when you define a logical cluster you need to use ozlb as your Oozie server address.
... View more
02-07-2017
09:53 PM
As a workaround, you can create an external table on /user/test1/csvfolder, and then insert records from this table into your internal table using "INSERT INTO TABLE tbl SELECT * FROM tbl_ext; However, creating external table requires write permission on your csvfolder, although the files there will be left intact.
... View more
02-07-2017
10:22 AM
First of all, you should not use "localhost" anywhere in your settings, but FQDNs. Let's call your Oozie HA servers oz1 and oz2, and your LB ozlb (these are all FQDNs). If you keep addressing Oozie by let's say oz1, then if oz1 goes down you lose Oozie functionality. That's the main reason to do HA, and the main reason to set OOZIE_BASE_URL=ozlb, because the LB can detect a failed Oozie server, and redirect the traffic to the other one. Of course, the LB is also doing load balancing, when both oz1 and oz2 are healthy (jobs on demand). HTH.
... View more
02-06-2017
11:20 PM
1 Kudo
Hi @jean rivera, yes, oozie servers "share" the scheduling by connecting to the same Oozie database. Scheduled jobs are executed only once, the server to run a job is selected randomly, and coordination is done using Zookeeper distributed locks. Regarding oozie_base_url, set it to the URL of your load balancer. More details about Oozie active-active architecture here. And finally, in order to use Oozie HA you must replace the default derby database shipped with Oozie with a "real" one, which supports concurrent queries, like MySql or postgres.
... View more
02-06-2017
08:13 PM
1 Kudo
You can safely ignore this. Your app first tried to contact rm1 and found it to be in the Stand-By mode: "WARN ipc.Client: Failed to connect to server: str20/10.5.168.121:8032:". After that it failed over to rm2: "INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2". If rm1 is Active, there will be no such message.
... View more
02-06-2017
06:24 AM
Hi Leo, it seems that error can happen because of some miscommunication between the RM, an app's AM and one or more NMs. One scenario is described in YARN-3535, in particular in this post. How to fix it depends on what are you doing. If it's after an HDP upgrade you can try to clear affected NM recovery directories, and restart Yarn. If it happens on a particular app, then something might be wrong with app, like some configs missing, etc.
... View more
02-01-2017
10:21 PM
As the Jira says, stmk is rarely used, but the current version of Zookeeper, ver. 3.4.6 was added in HDP-2.x, so yes, 3.4.7 or higher which fixes the bug should be added soon.
... View more
01-29-2017
01:32 AM
You also need your OS local repo, at least the binaries on so-called DVD-1. ambari-server requires postgres which comes from OS repo. There is no way to avoid this with yum. You can configure ambari-server to use another database during "ambari-server setup" step. You can install only ambari-server without postgres using "rpm -ivh -nodeps ambari-server***.rpm" but it's not recommended because there are other dependencies to check like python, and even if all that works you will need OS local repo to install HDP.
... View more
12-19-2016
01:03 PM
2 Kudos
The validate option works only on tables in HDFS, not on those in Hive and HBase. It works both with "import" and "export" Sqoop commands. The default validation class is org.apache.sqoop.validation.RowCountValidator which compares the number of rows in the source and destination tables. You can customize it by providing your own validation class, which must implement the org.apache.sqoop.validation.Validator interface. Check also the response to the same question asked before, and Sqoop documentation, chapter 11.
... View more