About aervits

aervits · ‎09-07-2016

@Mohan V this is not efficient but does what you're asking grunt> fs -cat text 1 a 2 b 3 c grunt> data = load 'text' using PigStorage(' ') AS (id:long, letter:chararray); grunt> A = FILTER data by letter == 'a'; grunt> B = FILTER data by letter == 'b'; grunt> C = FILTER data by letter == 'c'; grunt> STORE A into 'hbase://a' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter'); 2016-09-07 16:04:29,421 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 698875904 to monitor. collectionUsageThreshold = 489213120, usageThreshold = 489213120 ... grunt> STORE B into 'hbase://b' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter'); ... grunt> STORE C into 'hbase://c' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:letter'); now in hbase shell assuming tables were created create 'a', 'cf' create 'b', 'cf' create 'c', 'cf' hbase(main):001:0> scan 'a' ROW COLUMN+CELL 1 column=cf:letter, timestamp=1473264279802, value=a 1 row(s) in 0.2610 seconds hbase(main):002:0> scan 'b' ROW COLUMN+CELL 2 column=cf:letter, timestamp=1473264324881, value=b 1 row(s) in 0.0160 seconds hbase(main):003:0> scan 'c' ROW COLUMN+CELL 3 column=cf:letter, timestamp=1473264429688, value=c 1 row(s) in 0.0140 seconds

aervits · ‎09-07-2016

you would need to assign an alias to each row and specify separate store command per row.

aervits · ‎09-07-2016

can you manually execute the command on the failing host? sudo yum install -y ambari-metrics-collector

aervits · ‎09-06-2016

What features are you looking for in Flume 1.6? We backported a few things from 1.6 into 1.5.2 like Kafka channel and some security enhancements, these will appear in 2.5 even though Flume version does not change from HDP 2.4.

aervits · ‎09-06-2016

Metrics in https://issues.apache.org/jira/browse/HDFS-3170 satisfy the use case

aervits · ‎09-06-2016

there are a number of metrics available to monitor datanode performance, what specifically should I look at for datanode write performance.

aervits · ‎09-06-2016

pinging @Sriharsha Chintalapani @Andrew Grande @jwitt

aervits · ‎09-06-2016

I'd like to get a poll of pros and cons of Kafka vs. Nifi for multi-datacenter replication in terms of ease-of-use, tooling, tuning, security, etc.

aervits · ‎08-22-2016

WARNING: this is a workaround and does not mean is a certifiable solution from Hortonworks. In certain scenarios, customers are required to run client and services on different OS versions and flavors. I will only cover clients for Pig and Hive, certainly this procedure can be applied to services but it's more involved. It is advised that you contact Hortonworks support if you go down this path. Setup scenario: 3 node Ubuntu 12.04 cluster with HDP 2.4.2 and Ambari 2.2.2.0 1 node Centos 6 and no Ambari. on the centos6 node, download and install Java, preferably the same as on other nodes. I followed this document to install Oracle JDK 8 as that's what is running on my Ubuntu cluster. https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora on the centos6 node, download the hdp repo wget http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.4.2.0/hdp.repo cp hdp.repo /etc/yum.repos.d yum install hadoop pig hive-hcatalog hive-webhcat tez copy the /etc/hadoop/conf, /etc/pig/conf, /etc/tez/conf and /etc/hive/conf from the cluster to your new node into the same directories as on the other nodes. scp -r /etc/hadoop root@192.168.56.111: scp -r /etc/hive root@192.168.56.111: scp -r /etc/pig root@192.168.56.111: scp -r /etc/tez root@192.168.56.111: move the conf dir to it's designated directories: cp -r hadoop/conf /etc/hadoop/ cp -r hive/conf /etc/hive/ cp -r pig/conf /etc/pig/ cp -r tez/conf /etc/tez/ now you should be able to access hdfs, pig, hive and tez from new node. you can run validation of your environment based on the manual install guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-1a378094-a4fb-4348-bd9e-2eebf68c2e1e.1.html [root@centos6 ~]# cat test.txt foo bar foo bar foo [root@centos6 ~]# hdfs dfs -put test.txt /tmp/input/ [root@centos6 ~]# hadoop jar /usr/hdp/current/tez-client/tez-examples-*.jar orderedwordcount /tmp/input/test.txt /tmp/out [root@centos6 ~]# hdfs dfs -ls /tmp/out Found 2 items -rw-r--r-- 3 root hdfs 0 2016-08-22 14:11 /tmp/out/_SUCCESS -rw-r--r-- 3 root hdfs 12 2016-08-22 14:11 /tmp/out/part-v002-o000-r-00000 [root@centos6 ~]# hdfs dfs -cat /tmp/out/part-v002-o000-r-00000 bar 2 foo 3 you can do the same with pig using http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/validate_the_installation_pig.html [root@centos6 ~]# hdfs dfs -put /etc/passwd . [root@centos6 ~]# pig -x tez grunt> A = load 'passwd' using PigStorage(':'); grunt> B = foreach A generate $0 as id; grunt> store B into 'id.out'; grunt> fs -cat id.out/part-v000-o000-r-00000 root bin daemon adm lp sync shutdown and for hive [root@centos6 ~]# beeline WARNING: Use "yarn jar" to launch YARN applications. Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000 Connecting to jdbc:hive2://u1203.ambari.apache.org:10000 Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root Enter password for jdbc:hive2://u1203.ambari.apache.org:10000: Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables +------------+--------------+-------------+-------------+----------+--+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | +------------+--------------+-------------+-------------+----------+--+ +------------+--------------+-------------+-------------+----------+--+ 0: jdbc:hive2://u1203.ambari.apache.org:10000> create table test ( name string ) ; No rows affected (0.242 seconds) 0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables +------------+--------------+-------------+-------------+----------+--+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | +------------+--------------+-------------+-------------+----------+--+ | | default | test | TABLE | NULL | +------------+--------------+-------------+-------------+----------+--+ 0: jdbc:hive2://u1203.ambari.apache.org:10000> insert into table test values('artem'); INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into table test values('artem')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1471887368465_0006) INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test from hdfs://hacluster/apps/hive/warehouse/test/.hive-staging_hive_2016-08-22_18-18-58_629_3703254848398593955-1/-ext-10000 INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5] No rows affected (10.012 seconds) 0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test; +------------+--+ | test.name | +------------+--+ | artem | +------------+--+ 1 row selected (0.088 seconds) now from the cluster node root@u1201:~# beeline WARNING: Use "yarn jar" to launch YARN applications. Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000 Connecting to jdbc:hive2://u1203.ambari.apache.org:10000 Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: Enter password for jdbc:hive2://u1203.ambari.apache.org:10000: Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables +------------+--------------+-------------+-------------+----------+--+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | +------------+--------------+-------------+-------------+----------+--+ | | default | test | TABLE | NULL | +------------+--------------+-------------+-------------+----------+--+ 0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test; +------------+--+ | test.name | +------------+--+ | artem | +------------+--+ 1 row selected (0.083 seconds) for tez to work with hive, execute the following command on the client machine set hive.execution.engine=tez; [root@centos6 ~]# beeline WARNING: Use "yarn jar" to launch YARN applications. Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000 Connecting to jdbc:hive2://u1203.ambari.apache.org:10000 Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root Enter password for jdbc:hive2://u1203.ambari.apache.org:10000: Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258) Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://u1203.ambari.apache.org:10000> set hive.execution.engine=tez; No rows affected (0.041 seconds) 0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test; +------------+--+ | test.name | +------------+--+ | artem | +------------+--+ 1 row selected (0.107 seconds) If you were to install other clients, you'd follow the same hdp manual install/upgrade guides. For services installation, it will be a bit more involved but doable. Conclusion: This is certainly not a recommended approach but sometimes it's a necessary evil. Same should work with Apache releases not from HDP. I was certainly able to run Bigtop packages against HDP.

aervits · ‎08-22-2016

I checked our repo and HDP 2.3.6.0 directory is not listed in S3

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: STORE Pig OUTPUT into MULTIPLE HBase TABLES

Re: STORE Pig OUTPUT into MULTIPLE HBase TABLES

Re: ambari error

Re: Flume 1.6 in HDP?

Re: What are some of the metrics to monitor write ...

What are some of the metrics to monitor write pipe...

Re: Open forum discussion on pros and cons of Apac...

Open forum discussion on pros and cons of Apache K...

HDP clients with multi-version and multi-OS suppor...

HDP 2.3.6 manual install companion files are missi...