Member since
10-01-2015
3933
Posts
1148
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1128 | 05-03-2017 05:13 PM | |
978 | 05-02-2017 08:38 AM | |
1056 | 05-02-2017 08:13 AM | |
1321 | 04-20-2017 12:28 AM | |
1249 | 04-10-2017 10:51 PM |
09-07-2018
02:37 PM
Tez actually ships with Pig loader to mine Tez logs, you can find the details of it at https://github.com/apache/tez/tree/master/tez-tools/tez-tfile-parser Here's a sample set pig.splitCombination false;
set tez.grouping.min-size 52428800;
set tez.grouping.max-size 52428800;
/* Register all tez jars. Replace $TEZ_HOME, $TEZ_TFILE_DIR with absolute path */
register '$TEZ_HOME/*.jar';
register '$TEZ_TFILE_DIR/tfile-parser-1.0-SNAPSHOT.jar';
raw = load '/app-logs/root/logs/application_1411511669099_0769/*' using org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray);
filterByLine = FILTER raw BY (key MATCHES '.*container_1411511669099_0769_01_000001.*')
AND (line MATCHES '.*Shuffle.*');
dump filterByLine;
... View more
09-07-2018
01:47 PM
4 Kudos
This is a short how-to leveraging Zeppelin and Solr's native SQL capabilities to query the Ranger audit logs in real time. The capability to query Ranger audits has been in existence for quite a while and there are multiple articles available demonstrating how to apply a Hive external table on top of Ranger audits stored in HDFS. This article demonstrates how to leverage Zeppelin and Solr SQL to query Solr in real time without additional step of creating an external table on top of the HDFS audit. First thing you need is access to your Solr instance. I'm using the default instance packaged with Ambari infra. The Solr admin UI is available at the following address: http://{ambari-infra-ip}:8886. In the UI, you can issue arbitrary queries using standard Solr syntax. I am new to Solr and found the query syntax cumbersome. Instead, I decided to leverage Solr SQL, available as of version 6. HDP 3.0 ships with Solr 7.3. The next step is to set up Zeppelin interpreter for Solr via JDBC. Steps for doing that are available on the Solr website and I'm going to summarize the minimum required configuration for HDP 3. Feel free to copy and modify the properties below: default.driver : org.apache.solr.client.solrj.io.sql.DriverImpl default.url : jdbc:solr://{ambari-infra-ip}:2181/infra-solr?collection=ranger_audits default.user : solr In the artifacts section, add the following entry org.apache.solr:solr-solrj:7.3.1 Be mindful of the port for Zookeeper quorum for the Ranger Solr collection. I found the information browsing the ZK CLI shell. /usr/hdp/current/zookeeper-client/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 4] ls /infra-solr/collections/ranger_audits So once you enter that information into Zeppelin interpreter, you can now use the %solr command to browse Ranger audits with SQL, just add a new note with Solr interpreter selected. Notice I am using all of the standard fields in Ranger audit, you can find an older version of the schema at the following link. I say older because in HDP 3, Ranger supports multiple clusters and additional fields identifying separate clusters are available, by I digress. The query above will show all current events where result, i.e. access is denied. This is really convenient because you don't need to apply schema and data is available in real time. You can build powerful reporting capabilities on top of what is available in Ranger Admin UI, (in case your question was why even doing that if that info was available already via Ranger). Finally, once you press execute, the data will be shown below: You can now add more plotting libraries and built-in Zeppelin charting capabilities to make very powerful dashboards!
... View more
- Find more articles tagged with:
- Governance & Lifecycle
- How-ToTutorial
- Ranger
- solr
- solrcloud
- zeppelin
Labels:
07-19-2018
02:30 PM
@Charles Hedrick @Michal Vince You should be able to access the support matrix with your HWX support portal subscription. Otherwise, the Ambari ASF page has the necessary info https://ambari.apache.org/
... View more
05-22-2018
03:02 PM
@Sandeep Kumar is it possible that you're running into https://issues.apache.org/jira/browse/AMBARI-22518, it's always worth checking release notes for latest fixes. https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.0/bk_ambari-release-notes/content/ambari_relnotes-2.6.1.0-fixed-issues.html. Ambari 2.6.2 is out so it may be worth upgrading to the latest.
... View more
05-22-2018
11:48 AM
what's the functionality you're looking for in Hue? If you're seeking assistance with Oozie workflows, ambari workflow manager can do that for you. Hive, pig, yarn and rex come with their own views. Unless you need Solr and hbase, there's no reason to run Hue with HDP.
... View more
05-22-2018
11:43 AM
MultiWAL is a community feature and is not supported by HDP. If you're not on HDP, you can try the feature but your mileage will vary. Have you Considered placing WAL on SSD?
... View more
01-25-2018
04:38 PM
@Sai Geetha M N please read our latest docs https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/ch_introduction-spark.html and https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/ch08s05.html it's been out for a while now.
... View more
01-25-2018
04:33 PM
SHC has been released and has been tested at scale. What issues are you having @Sai Geetha M N ?
... View more
01-08-2018
03:11 PM
The Phoenix Kafka plugin is not released in HDP, please refer to https://issues.apache.org/jira/browse/PHOENIX-3214 for details on running the plugin.
... View more
11-06-2017
03:10 PM
yes looks like you figured it out, the class is coming from sharelib/oozie https://github.com/apache/oozie/blob/master/sharelib/oozie/src/main/java/org/apache/oozie/action/hadoop/OozieLauncherOutputCommitter.java and you probably were missing oozie directory within sharelib.
... View more
08-29-2017
08:08 PM
@Amit Panda take a look at feature of Oozie called dataset, it should help you in defining retention on your dataset https://oozie.apache.org/docs/4.3.0/CoordinatorFunctionalSpec.html#a5._Dataset
... View more
08-29-2017
04:37 PM
@bigdata.neophyte Unfortunately there is no documented procedure to migrate from CDH. In these cases it's best to engage with your local Hortonworks account rep and professional services. We've done a few of these migrations and can provide you with a runbook to do the migration.
... View more
08-29-2017
04:33 PM
@Joel Carver please review my tutorial for the caveats with setting up sqoop action. For example, starting with HDP 2.4 or 2.5, I forget, you need tez-site.xml in your lib directory. https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html
... View more
08-29-2017
04:28 PM
@Rahul Unnikrishnan please provide master logs and region server logs
... View more
08-29-2017
03:10 PM
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html I get a lot of questions about doing distcp and figured I'd write yet another article in the series on WFM. There's a common assumption that FS action should be able to do a copy within a cluster. Unfortunately it's not obvious that you can leverage distcp action to do a copy within a cluster instead. The reason behind FS action missing copy functionality is that copy is not meant to be distributed and will DOS your Oozie server until the action completes. What you need to do is use distcp action as it's meant to do distributed operations and it being decoupled from Oozie launcher will complete w/out DOS. The functionality is the same even with naming convention being a bit off. We're going to start with adding a new workflow and naming it distcp-wf. Now we're going to add distcp node to the flow. I prefer to name the nodes something other than default so I'll name it distcp_example and hit the gear button to configure it. Now in distcp arguments field, I'm going to use Oozie XML variable replacement to add the full HDFS path of the source and target, which happen to be in the same cluster. They could might as well be two separate clusters. Now if you're familiar with how Oozie and Mapreduce works, you're quickly going to realize that this workflow will only run once and fail second time around. The reason is that my destination never changes and if output exists, you're going to get a failure on the next run. For that, we're going to add a prepare action to delete destination file/directory. Copy the second argument to clipboard. Paste it into advanced properties and change mkdir drop-down to delete. We're almost ready to submit our workflow; I first have to create an HDFS directory (distcp-wf) that will contain my distcp workflow and file I'd like copied. hdfs dfs -mkdir distcp-wf
hdfs dfs -touchz file
hdfs dfs -ls
Found 4 items
drwx------ - centos hdfs 0 2017-08-29 14:35 .Trash
drwx------ - centos hdfs 0 2017-08-29 14:33 .staging
drwxr-xr-x - centos hdfs 0 2017-08-29 14:35 distcp-wf
-rw-r--r-- 3 centos hdfs 10 2017-08-29 01:26 file Now I'm ready to save and submit my workflow, enter the HDFS path of the workflow directory you just created notice the job properties have the fully-expanded nameNode and resourceManager addresses, that's what is being used for variable substitution. Now I am going to submit the job and and use filtering in the dashboard for the name of the workflow. Now let's switch back to the distcp action as I'd like to demonstrate a few other things about distcp that you can leverage. If you refer to distcp user guide you notice that there are many arguments we didn't cover like -append, -update etc. What if you would like to use them in your distcp? Well WFM has got you covered, the eagle-eyed users would see the tool-tip the first time we tried to configure distcp action node and see that you can pass the arguments in the same field as source and destination. So in addition to the two arguments, I'm going to add -update and -skipcrccheck in front of the existing ones. My workflow XML should now look like so So when I execute with new arguments, everything should still be green. On a side note, our documentation team has done a phenomenal job adding resources to our WFM section. I encourage everyone interested in WFM to review. The caveats with distcp is that in some cases you cannot do distcp via Oozie from secure to insecure and vice versa. There are parameters you have to specify to make it work in some cases but overall it is not supported in heterogeneous clusters. Other issues crop up when you distcp from HA enabled clusters. You have to specify the nameservices for both clusters. Please leverage HCC to find resources how to get that working. Hope this was useful!
... View more
- Find more articles tagged with:
- ambari-view
- ambari-views
- Governance & Lifecycle
- How-ToTutorial
- Oozie
- workflow-manager
Labels:
08-25-2017
05:07 PM
2 Kudos
Here's a much cleaner working example tested with HDP 2.6 wget http://central.maven.org/maven2/org/apache/parquet/parquet-pig-bundle/1.8.1/parquet-pig-bundle-1.8.1.jar hdfs dfs -put
parquet-pig-bundle-1.8.1.jar . pig –x tez REGISTER
hdfs://dlm3ha/user/centos/parquet-pig-bundle-1.8.1.jar; // words is a CSV file with five fields data = load 'words' using
PigStorage(',') as
(f1:chararray,f2:chararray,f3:chararray,f4:chararray,f5:chararray); store data into
'hdfs://dlm3ha/user/centos/output' using org.apache.parquet.pig.ParquetStorer;
... View more
07-20-2017
05:28 PM
@Stephanie Shen it is certainly possible but it'd be an edgenode not being managed by Ambari agent as Ambari will try to overwrite your configurations. What you can do is have multiple configuration directories and point each one of them to hdfs client. You can achieve that by exporting HADOOP_CONF_DIR global variable. I would also like to warn you that it's possible to make mistakes in such a set up as it's easy to mix up configurations and apply commands against environments you didn't intend to apply to. It's best to spin up a few lightweight nodes that you'd use for separate enironments or start using Ambari HDFS files view to interact with HDFS, that way you can have separate browser tab for each environment and complications with global variables goes away. We can discuss this further offline.
... View more
06-30-2017
08:05 PM
@riyer I'd avoid going against HBase with Hive. Generating a snapshot is so trivial that you should consider going that route first. On average, going against a snapshot should be 2.5x times better than going against HBase directly.
... View more
05-03-2017
11:48 PM
1 Kudo
Please review the ranger Kafka plugin FAQ for current limitations. https://cwiki.apache.org/confluence/display/RANGER/Kafka+Plugin
... View more
05-03-2017
11:32 PM
i answered this question in the comments of your other thread, you need to change version, build number is different. https://community.hortonworks.com/questions/100408/how-to-compile-ambari25-from-soucecode.html#comment-100416. please close the other question as well. <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.7.1.2.3.4.0-3485</version>
<type>jar</type>
</dependency>
... View more
05-03-2017
11:26 PM
please review our security guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_security/content/kafka_policy.html Kafka has its own section
... View more
05-03-2017
11:22 PM
1 Kudo
We have Sandbox 1.3 and 2.1, we no longer host 2.0, it is in the same place as new Sandbox, look for Hortonworks Sandbox Archive tab. https://hortonworks.com/downloads/#sandbox
... View more
05-03-2017
05:13 PM
1 Kudo
@ed day I'm guessing your ambari-server setup is borked. Please reinstall ambari server and agent. When you remove the packages, make sure to clean /var/lib/ambari-server /usr/lib/ambari-server /usr/lib/python2.6/site-packages/ambari* etc.
... View more
05-03-2017
05:10 PM
@HENI MAHER please open this as a new question and describe your problem in full.
... View more
05-02-2017
06:58 PM
@frank chen try the following and change the rest of the dependencies to the same build version <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.7.1.2.3.4.0-3485</version>
<type>jar</type>
</dependency>
... View more
05-02-2017
10:17 AM
1 Kudo
You can find these dependencies in HDP maven repos, please look at the following https://community.hortonworks.com/questions/74655/where-can-i-find-hdp-maven-repos.html#answer-74645
... View more
05-02-2017
08:43 AM
Your script location is wrong, start grunt in the directory with the script or provide full path to the script in the exec command
... View more
05-02-2017
08:38 AM
1 Kudo
you can create principal and keytabs manually, follow this guide, sorry it's from Pivotal but steps should be interchangeable http://pivotalhd-210.docs.pivotal.io/doc/2100/webhelp/topics/ConfiguringSecureFlume.html Which version of Ambari are you using, this seems to be an old issue https://issues.apache.org/jira/browse/AMBARI-13324
... View more
05-02-2017
08:13 AM
You can use flatten operator to remove bag thus removing the extra characters http://pig.apache.org/docs/r0.16.0/basic.html#flatten so before you finish generating the file with Pig, call the flatten operator and then load it in aHive table grunt> cat empty.bag
{} 1
grunt> A = LOAD 'empty.bag' AS (b : bag{}, i : int);
grunt> B = FOREACH A GENERATE flatten(b), i;
grunt> DUMP B;
grunt>
... View more
04-20-2017
12:28 AM
i released your response, we are dealing with a lot of spam as of late and our methods today are a bit restrictive
... View more