Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3558 | 05-03-2017 05:13 PM | |
| 2934 | 05-02-2017 08:38 AM | |
| 3183 | 05-02-2017 08:13 AM | |
| 3146 | 04-10-2017 10:51 PM | |
| 1622 | 03-28-2017 02:27 AM |
09-07-2016
04:08 PM
@Mohan V this is not efficient but does what you're asking grunt> fs -cat text
1 a
2 b
3 c
grunt> data = load 'text' using PigStorage(' ') AS (id:long, letter:chararray);
grunt> A = FILTER data by letter == 'a';
grunt> B = FILTER data by letter == 'b';
grunt> C = FILTER data by letter == 'c';
grunt> STORE A into 'hbase://a' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
2016-09-07 16:04:29,421 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 698875904 to monitor. collectionUsageThreshold = 489213120, usageThreshold = 489213120
...
grunt> STORE B into 'hbase://b' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
...
grunt> STORE C into 'hbase://c' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter');
now in hbase shell assuming tables were created
create 'a', 'cf'
create 'b', 'cf'
create 'c', 'cf' hbase(main):001:0> scan 'a'
ROW COLUMN+CELL
1 column=cf:letter, timestamp=1473264279802, value=a
1 row(s) in 0.2610 seconds
hbase(main):002:0> scan 'b'
ROW COLUMN+CELL
2 column=cf:letter, timestamp=1473264324881, value=b
1 row(s) in 0.0160 seconds
hbase(main):003:0> scan 'c'
ROW COLUMN+CELL
3 column=cf:letter, timestamp=1473264429688, value=c
1 row(s) in 0.0140 seconds
... View more
09-07-2016
01:44 PM
5 Kudos
you would need to assign an alias to each row and specify separate store command per row.
... View more
09-07-2016
12:21 PM
can you manually execute the command on the failing host? sudo yum install -y ambari-metrics-collector
... View more
09-06-2016
10:35 PM
What features are you looking for in Flume 1.6? We backported a few things from 1.6 into 1.5.2 like Kafka channel and some security enhancements, these will appear in 2.5 even though Flume version does not change from HDP 2.4.
... View more
09-06-2016
04:28 PM
Metrics in https://issues.apache.org/jira/browse/HDFS-3170 satisfy the use case
... View more
09-06-2016
04:13 PM
1 Kudo
there are a number of metrics available to monitor datanode performance, what specifically should I look at for datanode write performance.
... View more
Labels:
- Labels:
-
Apache Hadoop
09-06-2016
03:33 PM
pinging @Sriharsha Chintalapani @Andrew Grande @jwitt
... View more
09-06-2016
03:31 PM
1 Kudo
I'd like to get a poll of pros and cons of Kafka vs. Nifi for multi-datacenter replication in terms of ease-of-use, tooling, tuning, security, etc.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
08-22-2016
06:22 PM
6 Kudos
WARNING: this is a workaround and does not mean is a certifiable solution from Hortonworks. In certain scenarios, customers are required to run client and services on different OS versions and flavors. I will only cover clients for Pig and Hive, certainly this procedure can be applied to services but it's more involved. It is advised that you contact Hortonworks support if you go down this path. Setup scenario: 3 node Ubuntu 12.04 cluster with HDP 2.4.2 and Ambari 2.2.2.0 1 node Centos 6 and no Ambari. on the centos6 node, download and install Java, preferably the same as on other nodes. I followed this document to install Oracle JDK 8 as that's what is running on my Ubuntu cluster. https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora on the centos6 node, download the hdp repo wget http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.4.2.0/hdp.repo
cp hdp.repo /etc/yum.repos.d
yum install hadoop pig hive-hcatalog hive-webhcat tez copy the /etc/hadoop/conf, /etc/pig/conf, /etc/tez/conf and /etc/hive/conf from the cluster to your new node into the same directories as on the other nodes. scp -r /etc/hadoop root@192.168.56.111:
scp -r /etc/hive root@192.168.56.111:
scp -r /etc/pig root@192.168.56.111:
scp -r /etc/tez root@192.168.56.111:
move the conf dir to it's designated directories: cp -r hadoop/conf /etc/hadoop/
cp -r hive/conf /etc/hive/
cp -r pig/conf /etc/pig/
cp -r tez/conf /etc/tez/
now you should be able to access hdfs, pig, hive and tez from new node. you can run validation of your environment based on the manual install guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-1a378094-a4fb-4348-bd9e-2eebf68c2e1e.1.html [root@centos6 ~]# cat test.txt
foo
bar
foo
bar
foo
[root@centos6 ~]# hdfs dfs -put test.txt /tmp/input/
[root@centos6 ~]# hadoop jar /usr/hdp/current/tez-client/tez-examples-*.jar orderedwordcount /tmp/input/test.txt /tmp/out
[root@centos6 ~]# hdfs dfs -ls /tmp/out
Found 2 items
-rw-r--r-- 3 root hdfs 0 2016-08-22 14:11 /tmp/out/_SUCCESS
-rw-r--r-- 3 root hdfs 12 2016-08-22 14:11 /tmp/out/part-v002-o000-r-00000
[root@centos6 ~]# hdfs dfs -cat /tmp/out/part-v002-o000-r-00000
bar 2
foo 3
you can do the same with pig using http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/validate_the_installation_pig.html [root@centos6 ~]# hdfs dfs -put /etc/passwd .
[root@centos6 ~]# pig -x tez
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> store B into 'id.out';
grunt> fs -cat id.out/part-v000-o000-r-00000
root
bin
daemon
adm
lp
sync
shutdown
and for hive [root@centos6 ~]# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> create table test ( name string ) ;
No rows affected (0.242 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
| | default | test | TABLE | NULL |
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> insert into table test values('artem');
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into table test values('artem')(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1471887368465_0006)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table default.test from hdfs://hacluster/apps/hive/warehouse/test/.hive-staging_hive_2016-08-22_18-18-58_629_3703254848398593955-1/-ext-10000
INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5]
No rows affected (10.012 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.088 seconds)
now from the cluster node root@u1201:~# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000:
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
| | default | test | TABLE | NULL |
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.083 seconds)
for tez to work with hive, execute the following command on the client machine set hive.execution.engine=tez; [root@centos6 ~]# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> set hive.execution.engine=tez;
No rows affected (0.041 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.107 seconds)
If you were to install other clients, you'd follow the same hdp manual install/upgrade guides. For services installation, it will be a bit more involved but doable. Conclusion: This is certainly not a recommended approach but sometimes it's a necessary evil. Same should work with Apache releases not from HDP. I was certainly able to run Bigtop packages against HDP.
... View more
Labels:
08-22-2016
04:13 PM
I checked our repo and HDP 2.3.6.0 directory is not listed in S3
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)