Member since
09-25-2015
230
Posts
276
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
25012 | 07-05-2016 01:19 PM | |
8404 | 04-01-2016 02:16 PM | |
2101 | 02-17-2016 11:54 AM | |
5639 | 02-17-2016 11:50 AM | |
12626 | 02-16-2016 02:08 AM |
01-23-2016
01:08 AM
1 Kudo
@Ali Bajwa Doesnt Active Directory provide this full-integrated-and-automated way?
... View more
01-20-2016
12:07 AM
1 Kudo
@Balachandran Karnati Collect_list uses ArrayList, so the data will be kept in the same order they were added, to do that, uou need to use SORT BY clause in a subquery, don't use ORDER BY, it will cause your query to execute in a non-distributed way. Find a simple example below: drop table collect;
create external table collect(
key string,
order_field int,
value double
)
row format delimited fields terminated by ','
stored as textfile
;
[root@sandbox ~]# cat collect1.txt
a,1,1.0
a,3,3.0
a,2,2.0
b,1,1.0
b,2,2.0
b,3,3.0
[root@sandbox ~]# cat collect2.txt
a,1,1.1
a,3,3.1
a,2,2.1
b,1,1.1
b,2,2.1
b,3,3.1
[root@sandbox ~]# hadoop fs -put collect* /apps/hive/warehouse/collect
drop table IF EXISTS collect_sorted;
create table collect_sorted as
select key, collect_list(value)
from
(select * from collect sort by key, order_field, value desc) x
group by key
;
... View more
01-07-2016
05:13 PM
1 Kudo
Hortonworks does not support sqoop2 now. Sqoop supported in lastest hdp 2.3.4 is:
Apache Sqoop 1.4.6
... View more
01-07-2016
04:35 PM
1 Kudo
See this screenshot from sqoop in Ambari. It's only client. There no sqoop metastore service, by default sqoop uses derby database, but if you want, you can use external mysql or postgres database for sqoop, then, you can configure this database in HA mode.
... View more
01-07-2016
11:35 AM
1 Kudo
sqoop is a client only, so you can have sqoop installed in multiple nodes behind a IP load balancer. I don't know about Lilly indexer (part of HDP Search Connector). Documentation is here: https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Jobs.html#_hbase-indexer, but I'm not sure if it has HA out-of-the-box with Solr Cloud.
... View more
01-07-2016
10:52 AM
2 Kudos
@Mehdi TAZI There is a High Availability section in: http://docs.hortonworks.com (choose you version) For lastest HDP version (2.3.4), see this: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...
... View more
01-06-2016
05:33 PM
Awesome!!!
... View more
12-29-2015
03:44 PM
2 Kudos
Is it possible to execute flume agent outside hadoop network using Knox gateway + WebHDFS? I found this JIRA (https://issues.apache.org/jira/browse/FLUME-2701), but it's not resolved yet. A workaround I found would be to mount hdfs/nfs in flume-agent remote node and use File Roll Sink to this nfs/hdfs directory, but it does not seem be a good approach.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
Apache Knox
12-28-2015
08:16 PM
1 Kudo
@Peter Lasne Tutorial will be fixed ASAP. @zblanco
... View more
12-28-2015
07:54 PM
1 Kudo
@Sooraj Antony I believe the problem is most of records has same value for EMP_TYPE (aka skewed records), that would cause all records from same EMP_TYPE value to be sent to same reducer and would cause the last reducer to take a long time to finish. Assuming you have small number of different values of EMP_TYPE (and the result fit in memory), try solution below: set hive.ignore.mapjoin.hint=false;
SELECT /*+ MAPJOIN(A) */ STG.EMP_TYPE,DEPT,COUNT(DISTINCT EMP_ID) AS COUNT, A.TOTAL_COUNT
FROM STAGE_SOURCE STG
LEFT OUTER JOIN
(SELECT EMP_TYPE,COUNT(DISTINCT EMP_ID) AS TOTAL_COUNT FROM STAGE_SOURCE GROUP BY EMP_TYPE) A
ON STG.EMP_TYPE = A.EMP_TYPE
GROUP BY STG.EMP_TYPE,DEPT,A.TOTAL_COUNT;
... View more