Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
01-06-2017
04:51 PM
@Saumil Mayani that's a great suggestion, it was on my follow up list as when you provision a new user and/or new view, permissions need to be added. Will follow up with that suggestion soon.
... View more
12-16-2016
02:27 PM
10 Kudos
For the Devops crowd, you can take it a step further and automate provisioning of Ambari server. For that consider using community contributions like https://supermarket.chef.io/cookbooks/ambari if you were to use Chef, for example. For an exhaustive tour of the REST API, consult the docs https://github.com/apache/ambari/blob/trunk/ambari-views/docs/index.md This recipe assumes you have an unsecured HDP cluster with Namenode HA. Tested on Ambari 2.4.2. # list all available views for the current version of Ambari curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/ # get Files View only curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/FILES # get all versions of Files View for the current Ambari release curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/FILES/versions # get specific version of FILES view curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/FILES/versions/1.0.0 # create an instance of FILES view curl --user admin:admin -i -H 'X-Requested-By: ambari' -X POST http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE # delete an instance of FILES view curl --user admin:admin -i -H 'X-Requested-By: ambari' -X DELETE http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE # get specific instance of FILES view curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE # create a Files view instance with properties curl --user admin:admin -i -H 'X-Requested-By: ambari' -X POST http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE \
--data '{
"ViewInstanceInfo" : {
"description" : "Files API",
"label" : "Files View",
"properties" : {
"webhdfs.client.failover.proxy.provider" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"webhdfs.ha.namenode.http-address.nn1" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.http-address.nn2" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.https-address.nn1" : "u1201.ambari.apache.org:50470",
"webhdfs.ha.namenode.https-address.nn2" : "u1202.ambari.apache.org:50470",
"webhdfs.ha.namenode.rpc-address.nn1" : "u1201.ambari.apache.org:8020",
"webhdfs.ha.namenode.rpc-address.nn2" : "u1202.ambari.apache.org:8020",
"webhdfs.ha.namenodes.list" : "nn1,nn2",
"webhdfs.nameservices" : "hacluster",
"webhdfs.url" : "webhdfs://hacluster"
}
}
}' # create/update a Files view new/existing instance with new properties curl --user admin:admin -i -H 'X-Requested-By: ambari' -X PUT http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE \
--data '{
"ViewInstanceInfo" : {
"description" : "Files API",
"label" : "Files View",
"properties" : {
"webhdfs.client.failover.proxy.provider" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"webhdfs.ha.namenode.http-address.nn1" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.http-address.nn2" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.https-address.nn1" : "u1201.ambari.apache.org:50470",
"webhdfs.ha.namenode.https-address.nn2" : "u1202.ambari.apache.org:50470",
"webhdfs.ha.namenode.rpc-address.nn1" : "u1201.ambari.apache.org:8020",
"webhdfs.ha.namenode.rpc-address.nn2" : "u1202.ambari.apache.org:8020",
"webhdfs.ha.namenodes.list" : "nn1,nn2",
"webhdfs.nameservices" : "hacluster",
"webhdfs.url" : "webhdfs://hacluster"
}
}
}' # create instance of Hive view curl --user admin:admin -i -H 'X-Requested-By: ambari' -X POST http://localhost:8080/api/v1/views/HIVE/versions/1.0.0/instances/HIVE_NEW_INSTANCE \
--data '{
"ViewInstanceInfo" : {
"description" : "Hive View",
"label" : "Hive View",
"properties" : {
"webhdfs.client.failover.proxy.provider" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"webhdfs.ha.namenode.http-address.nn1" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.http-address.nn2" : "u1201.ambari.apache.org:50070",
"webhdfs.ha.namenode.https-address.nn1" : "u1201.ambari.apache.org:50470",
"webhdfs.ha.namenode.https-address.nn2" : "u1202.ambari.apache.org:50470",
"webhdfs.ha.namenode.rpc-address.nn1" : "u1201.ambari.apache.org:8020",
"webhdfs.ha.namenode.rpc-address.nn2" : "u1202.ambari.apache.org:8020",
"webhdfs.ha.namenodes.list" : "nn1,nn2",
"webhdfs.nameservices" : "hacluster",
"webhdfs.url" : "webhdfs://hacluster",
"hive.host" : "u1203.ambari.apache.org",
"hive.http.path" : "cliservice",
"hive.http.port" : "10001",
"hive.metastore.warehouse.dir" : "/apps/hive/warehouse",
"hive.port" : "10000",
"hive.transport.mode" : "binary",
"yarn.ats.url" : "http://u1202.ambari.apache.org:8188",
"yarn.resourcemanager.url" : "u1202.ambari.apache.org:8088"
}
}
}' # interact with a FILES view instance curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE/resources/files/fileops/listdir?path=%2F # once you create an instance, you can see its current properties curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/
iews/FILES/versions/1.0.0/ # output of previous command {
"href" : "http://localhost:8080/api/v1/views/FILES/versions/1.0.0/",
"ViewVersionInfo" : {
"archive" : "/var/lib/ambari-server/resources/views/work/FILES{1.0.0}",
"build_number" : "161",
"cluster_configurable" : true,
"description" : null,
"label" : "Files",
"masker_class" : null,
"max_ambari_version" : null,
"min_ambari_version" : "2.0.*",
"parameters" : [
{
"name" : "webhdfs.url",
"description" : "Enter the WebHDFS FileSystem URI. Typically this is the dfs.namenode.http-address\n
property in the hdfs-site.xml configuration. URL must be accessible from Ambari Server.",
"label" : "WebHDFS FileSystem URI",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "core-site/fs.defaultFS",
"required" : true,
"masked" : false
},
{
"name" : "webhdfs.nameservices",
"description" : "Comma-separated list of nameservices. Value of hdfs-site/dfs.nameservices property",
"label" : "Logical name of the NameNode cluster",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "hdfs-site/dfs.nameservices",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.ha.namenodes.list",
"description" : "Comma-separated list of namenodes for a given nameservice.\n Value of hdfs
-site/dfs.ha.namenodes.[nameservice] property",
"label" : "List of NameNodes",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.ha.namenode.rpc-address.nn1",
"description" : "RPC address for first name node.\n Value of hdfs-site/dfs.namenode.rpc-add
ress.[nameservice].[namenode1] property",
"label" : "First NameNode RPC Address",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.ha.namenode.rpc-address.nn2",
"description" : "RPC address for second name node.\n Value of hdfs-site/dfs.namenode.rpc-ad
dress.[nameservice].[namenode2] property",
"label" : "Second NameNode RPC Address",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.ha.namenode.http-address.nn1",
"description" : "WebHDFS address for first name node.\n Value of hdfs-site/dfs.namenode.htt
p-address.[nameservice].[namenode1] property",
"label" : "First NameNode HTTP (WebHDFS) Address",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.ha.namenode.http-address.nn2",
"description" : "WebHDFS address for second name node.\n Value of hdfs-site/dfs.namenode.ht
tp-address.[nameservice].[namenode2] property",
"label" : "Second NameNode HTTP (WebHDFS) Address",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.client.failover.proxy.provider",
"description" : "The Java class that HDFS clients use to contact the Active NameNode\n Valu
e of hdfs-site/dfs.client.failover.proxy.provider.[nameservice] property",
"label" : "Failover Proxy Provider",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "fake",
"required" : false,
"masked" : false
},
{
"name" : "hdfs.auth_to_local",
"description" : "Auth to Local Configuration",
"label" : "Auth To Local",
"placeholder" : null,
"defaultValue" : null,
"clusterConfig" : "core-site/hadoop.security.auth_to_local",
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.username",
"description" : "doAs for proxy user for HDFS. By default, uses the currently logged-in Ambari user.",
"label" : "WebHDFS Username",
"placeholder" : null,
"defaultValue" : "${username}",
"clusterConfig" : null,
"required" : false,
"masked" : false
},
{
"name" : "webhdfs.auth",
"description" : "Semicolon-separated authentication configs.",
"label" : "WebHDFS Authorization",
"placeholder" : "auth=SIMPLE",
"defaultValue" : null,
"clusterConfig" : null,
"required" : false,
"masked" : false
}
],
"status" : "DEPLOYED",
"status_detail" : "Deployed /var/lib/ambari-server/resources/views/work/FILES{1.0.0}.",
"system" : false,
"version" : "1.0.0",
"view_name" : "FILES"
},
"permissions" : [
{
"href" : "http://localhost:8080/api/v1/views/FILES/versions/1.0.0/permissions/4",
"PermissionInfo" : {
"permission_id" : 4,
"version" : "1.0.0",
"view_name" : "FILES"
}
}
],
"instances" : [
{
"href" : "http://localhost:8080/api/v1/views/FILES/versions/1.0.0/instances/Files",
"ViewInstanceInfo" : {
"instance_name" : "Files",
"version" : "1.0.0",
"view_name" : "FILES"
}
}
]
... View more
Labels:
11-20-2016
11:11 PM
@Satish Bomma it would make an awesome follow up to this article, I am not sure whether Ranger authorization can be applied to tables driven by HBaseStorageHandlers, native Atlas/Ranger integration is coming in the next release.
... View more
09-15-2016
02:57 AM
4 Kudos
As I was reading over Pig mailing lists, I found a plugin that allows one to turn Eclipse IDE into a Pig editor. There are alternatives like Ambari Pig view and Hue but if you're comfortable using an IDE, then you will feel right at home with this plugin. Setup is fairly straightforward and surprisingly supports Pig from 0.11 to 0.16, which means you can even use it with the latest HDP 2.5 release! The best part, I didn't need to start a Sandbox, download a binary distribution of Pig or mess with any config files, once plugin is installed you can start scripting as if you were working on a cluster. It is still work in progress but feature list is impressive nonetheless. It supports the following features as of 9/13/16:
supports Apache Pig Latin 0.11 - 0.16 syntax highlighting open declaration (F3) - for macros, defines, and UDF's auto complete (ctrl+space) - defines, relations, reserved words, built in functions toggle comment (ctrl+shift+c) - to comment/uncomment a block using -- find references (ctrl+shift+g) - preliminary implementation, to find usages of macros hover information (tooltips) for macro definitions, UDF's and built in functions (javadocs) and some keywords preferences page for colors, Pig version, auto complete behavior and more I will explore some of the features in this article. I guess the one limitation I've found is that it's tested with older versions of Eclipse with latest being Luna. I am running this demo with Luna to avoid any unforeseen issues. First thing we need to do is go to Help > Install New Software, give the plugin a name and enter URL http://github.com/eyala/pig-eclipse-update-site/raw/master Select the Pig-Eclipse plugin and hit Next Eclipse will fetch the bits Accept the EULA and hit Finish At this point we can create a new project in a typical Eclipse fashion and start writing Pig scripts Saving Pig scripts with .pig extension identifies them automatically, you can change the behavior in preferences section Let's see code completion, you invoke it with ctrl+space You can toggle comment with ctrl+shft+c and you can change preferences for theme and colors as well as behavior based on Pig version This of course is not meant to be an exhaustive deep-dive into Pig-Eclipse but something to get you started. Overall, it's an interesting project that I'll be spending more time with, the convenience, ease-of-use and independence from Sandbox are all of the benefits of using this plugin over other alternatives. I thank the author for his contribution. Here's the project GitHub page.
... View more
Labels:
08-22-2016
06:22 PM
6 Kudos
WARNING: this is a workaround and does not mean is a certifiable solution from Hortonworks. In certain scenarios, customers are required to run client and services on different OS versions and flavors. I will only cover clients for Pig and Hive, certainly this procedure can be applied to services but it's more involved. It is advised that you contact Hortonworks support if you go down this path. Setup scenario: 3 node Ubuntu 12.04 cluster with HDP 2.4.2 and Ambari 2.2.2.0 1 node Centos 6 and no Ambari. on the centos6 node, download and install Java, preferably the same as on other nodes. I followed this document to install Oracle JDK 8 as that's what is running on my Ubuntu cluster. https://www.digitalocean.com/community/tutorials/how-to-install-java-on-centos-and-fedora on the centos6 node, download the hdp repo wget http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.4.2.0/hdp.repo
cp hdp.repo /etc/yum.repos.d
yum install hadoop pig hive-hcatalog hive-webhcat tez copy the /etc/hadoop/conf, /etc/pig/conf, /etc/tez/conf and /etc/hive/conf from the cluster to your new node into the same directories as on the other nodes. scp -r /etc/hadoop root@192.168.56.111:
scp -r /etc/hive root@192.168.56.111:
scp -r /etc/pig root@192.168.56.111:
scp -r /etc/tez root@192.168.56.111:
move the conf dir to it's designated directories: cp -r hadoop/conf /etc/hadoop/
cp -r hive/conf /etc/hive/
cp -r pig/conf /etc/pig/
cp -r tez/conf /etc/tez/
now you should be able to access hdfs, pig, hive and tez from new node. you can run validation of your environment based on the manual install guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-1a378094-a4fb-4348-bd9e-2eebf68c2e1e.1.html [root@centos6 ~]# cat test.txt
foo
bar
foo
bar
foo
[root@centos6 ~]# hdfs dfs -put test.txt /tmp/input/
[root@centos6 ~]# hadoop jar /usr/hdp/current/tez-client/tez-examples-*.jar orderedwordcount /tmp/input/test.txt /tmp/out
[root@centos6 ~]# hdfs dfs -ls /tmp/out
Found 2 items
-rw-r--r-- 3 root hdfs 0 2016-08-22 14:11 /tmp/out/_SUCCESS
-rw-r--r-- 3 root hdfs 12 2016-08-22 14:11 /tmp/out/part-v002-o000-r-00000
[root@centos6 ~]# hdfs dfs -cat /tmp/out/part-v002-o000-r-00000
bar 2
foo 3
you can do the same with pig using http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/validate_the_installation_pig.html [root@centos6 ~]# hdfs dfs -put /etc/passwd .
[root@centos6 ~]# pig -x tez
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> store B into 'id.out';
grunt> fs -cat id.out/part-v000-o000-r-00000
root
bin
daemon
adm
lp
sync
shutdown
and for hive [root@centos6 ~]# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> create table test ( name string ) ;
No rows affected (0.242 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
| | default | test | TABLE | NULL |
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> insert into table test values('artem');
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into table test values('artem')(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1471887368465_0006)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table default.test from hdfs://hacluster/apps/hive/warehouse/test/.hive-staging_hive_2016-08-22_18-18-58_629_3703254848398593955-1/-ext-10000
INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5]
No rows affected (10.012 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.088 seconds)
now from the cluster node root@u1201:~# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000:
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> !tables
+------------+--------------+-------------+-------------+----------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------+--------------+-------------+-------------+----------+--+
| | default | test | TABLE | NULL |
+------------+--------------+-------------+-------------+----------+--+
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.083 seconds)
for tez to work with hive, execute the following command on the client machine set hive.execution.engine=tez; [root@centos6 ~]# beeline
WARNING: Use "yarn jar" to launch YARN applications.
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
beeline> !connect jdbc:hive2://u1203.ambari.apache.org:10000
Connecting to jdbc:hive2://u1203.ambari.apache.org:10000
Enter username for jdbc:hive2://u1203.ambari.apache.org:10000: root
Enter password for jdbc:hive2://u1203.ambari.apache.org:10000:
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://u1203.ambari.apache.org:10000> set hive.execution.engine=tez;
No rows affected (0.041 seconds)
0: jdbc:hive2://u1203.ambari.apache.org:10000> select * from test;
+------------+--+
| test.name |
+------------+--+
| artem |
+------------+--+
1 row selected (0.107 seconds)
If you were to install other clients, you'd follow the same hdp manual install/upgrade guides. For services installation, it will be a bit more involved but doable. Conclusion: This is certainly not a recommended approach but sometimes it's a necessary evil. Same should work with Apache releases not from HDP. I was certainly able to run Bigtop packages against HDP.
... View more
Labels:
08-08-2016
04:54 PM
2 Kudos
UPDATE: I'm happy to report that my patch for PIG-4931 was accepted and merged to trunk. I was browsing through Apache Pig Jiras and stumbled on Jira https://issues.apache.org/jira/browse/PIG-4931 requiring to document Pig "IN" operator. Turns out Pig had IN operator since days of 0.12 and no one had a chance
to document it yet. The associated JIRA is https://issues.apache.org/jira/browse/PIG-3269. In this short article I will go over the IN operator and until I'm able to submit a patch to close
out the ticket this should serve as its documentation. Now, IN operator in Pig works like in SQL. You provide a
list of fields and it will return just those rows. It is a lot more useful than for example a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY
(i == 1) OR
(i == 22) OR
(i == 333) OR
(i == 4444) OR
(i == 55555); You can rewrite the same statement as a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555); The best thing about it is that it accepts more than just Integers, you can pass float, double, BigDecimal, BigInteger, bytearray and String.
Let's review each one in detail grunt> fs -cat data;
1,Christine,Romero,Female
2,Sara,Hansen,Female
3,Albert,Rogers,Male
4,Kimberly,Morrison,Female
5,Eugene,Baker,Male
6,Ann,Alexander,Female
7,Kathleen,Reed,Female
8,Todd,Scott,Male
9,Sharon,Mccoy,Female
10,Evelyn,Rice,Female Passing an integer to IN clause A = load 'data' using PigStorage(',') AS (id:int, first:chararray, last:chararray, gender:chararray);
X = FILTER A BY id IN (4, 6);
dump X;
(4,Kimberly,Morrison,Female)
(6,Ann,Alexander,Female) Passing a String A = load 'data' using PigStorage(',') AS (id:chararray, first:chararray, last:chararray, gender:chararray);
X = FILTER A BY id IN ('2', '4', '8');
dump X;
(2,Sara,Hansen,Female)
(4,Kimberly,Morrison,Female)
(8,Todd,Scott,Male) Passing a ByteArray A = load 'data' using PigStorage(',') AS (id:bytearray, first:chararray, last:chararray, gender:chararray);
X = FILTER A BY id IN ('1', '9');
dump X;
(1,Christine,Romero,Female)
(9,Sharon,Mccoy,Female) Passing a BigInteger and using NOT operator, thereby negating the passed list of fields in the IN clause A = load 'data' using PigStorage(',') AS (id:biginteger, first:chararray, last:chararray, gender:chararray);
X = FILTER A BY NOT id IN (1, 3, 5, 7, 9);
dump X;
(2,Sara,Hansen,Female)
(4,Kimberly,Morrison,Female)
(6,Ann,Alexander,Female)
(8,Todd,Scott,Male)
(10,Evelyn,Rice,Female) Now I understand that most cool kids these days are using Spark; I strongly believe Pig has a place in any Big Data stack and it's livelihood depends on comprehensive and complete documentation. Happy learning!
... View more
Labels:
07-29-2016
06:33 PM
thanks for fixing, it's not a typo, it's code formatting in HCC. @Kuldeep Kulkarni
... View more
07-16-2016
02:08 AM
5 Kudos
Update: apparently when you initiate a support case resolution capture, let's say for HBase service, it will pull HDFS namenode logs in addition to the HBase logs. You may be faced with the same issue and may have to apply the approach below to overcome timeouts. In SmartSense 1.3.0 this will no longer be an issue. Until then, this is a way to avoid capture time outs. Firstly, lets discuss the difference between capture for analysis and support case resolution. Analysis bundles do not collect service logs. For support cases, you're going to fetch configuration and logs. Then based on how much anonymization you will want to apply, large log files will take a long time to collect. This is especially prominent with HDFS namenode logs. They tend to be big and this is exactly the scenario we're trying to address. Firstly, increase the threshold for agent time out in Ambari. In my case it was 30min. Feel free to raise it up to 2hrs on the Ambari SmartSense Operations page. Then, we're going to exclude anything but hadoop-hdfs-namenode-*.log logs. That leaves .out and .out.* and .log.* files out of the collection. On the HST server host, where HST is analogous to SmartSense, go to /var/lib/smartsense/hst-agent/resources/scripts directory. Notice we're accessing hst-agent not hst-server directory. The collection scripts exist on agent hosts not on hst-server. Edit hdfs-scripts.xml file and go to line 100, it may be 10 lines give or take depending on which version of SmartSense you're running. On 1.2.2, it is line 100. Change the following lines if [ `hostname -f` == "${MASTER}" ] && [ `echo "${SLAVES}" | grep -o ',' | wc -l` -gt 1 ] ; then
find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \;
find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \;
else
for file in `find $LOG 2>/dev/null -type f -mtime -2 -iname '*' ;
find $LOG 2>/dev/null -type f -mtime -2 -iname '*' ; `
to if [ `hostname -f` == "${MASTER}" ] && [ `echo "${SLAVES}" | grep -o ',' | wc -l` -gt 1 ] ; then
# find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \;
find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' -exec cp '{}' ${outputdir} \;
else
for file in `find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' ;
find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' ;
It is hard to see the difference, what we changed is actually comment out first find command, in 2nd find command, we replaced '*' to '*.log' and repeated the same in the for loop and again in the last find command. So for every iteration of '*', replace that with '*.log'. As the last step, let's restart SmartSense service and agents to propagate the changes to every agent; we only care about namenode nodes but depending on service and host components, I don't see why we couldn't restart all of them. One other thing I'd like to point out is that that same directory /var/lib/smartsense/hst-agent/resources/scripts contains scripts for other services, so essentially you can apply the same steps for any other service. Granted this is a pretty corner use case but when you're investigating a high severity issue and you have no means of uploading logs besides going at it the hard way, this may be a good approach. Finally, let's verify this approach. Go to SmartSense view and initiate a capture. At this point, when capture is complete, go to the SmartSense server node and navigate to the local storage directory. in that directory, you will find your latest bundle, uncompress it and cd into the new directory In that directory, there will be another compressed file, uncompress that as well. Finally CD into that directory and then into services directory. At this point you will see various services. We care about HDFS. Go inside it and finally into logs directory. There you will find your *.logs I want to highlight the fact that this is a hack and use it at your own risk. At the very least, notify your support engineer of the approach. I'd like to thank @Paul Codding and @sheetal for showing me the inner-workings of SmartSense. Your feedback is welcome.
... View more
Labels:
05-09-2016
11:18 AM
Local user on OS
... View more
05-07-2016
09:47 PM
you can create your own root principal su admin
bash-4.1$ kadmin
Couldn't open log file /var/log/kadmind.log: Permission denied
Authenticating as principal admin/admin@HWX.COM with password.
Password for admin/admin@HWX.COM:
kadmin: add_principal root
Enter password for principal "root@HWX.COM":
Re-enter password for principal "root@HWX.COM":
kadmin: exit
... View more