Member since
09-24-2015
22
Posts
31
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
305 | 05-31-2017 02:18 PM | |
399 | 06-09-2016 08:19 PM | |
617 | 02-03-2016 08:37 PM | |
5243 | 02-03-2016 08:26 PM | |
483 | 12-09-2015 06:54 PM |
05-31-2017
10:01 PM
I submitted a fix to this README. You can view progress here https://github.com/apache/metron/pull/601 https://issues.apache.org/jira/browse/METRON-977 @HS, the Metron community always looking for contributors and future committers and is extremely helpful (imho) in getting users involved. I see you've been active on the Metron boards here on HCC and we would be happy to get you open source community credit should you also choose to submit Jiras/PRs in the future. Best, Mike.
... View more
05-31-2017
02:18 PM
1 Kudo
Hi guys. Yes, it would appear that doc example is outdated. "index" and "batchSize" belong in the indexing config. Here is a sample for bro from the current source: cat metron-platform/metron-enrichment/src/main/config/zookeeper/enrichments/bro.json
{
"enrichment" : {
"fieldMap": {
"geo": ["ip_dst_addr", "ip_src_addr"],
"host": ["host"]
}
},
"threatIntel": {
"fieldMap": {
"hbaseThreatIntel": ["ip_src_addr", "ip_dst_addr"]
},
"fieldToTypeMap": {
"ip_src_addr" : ["malicious_ip"],
"ip_dst_addr" : ["malicious_ip"]
}
}
}
cat metron-platform/metron-indexing/src/main/config/zookeeper/indexing/bro.json
{
"hdfs" : {
"index": "bro",
"batchSize": 5,
"enabled" : true
},
"elasticsearch" : {
"index": "bro",
"batchSize": 5,
"enabled" : true
},
"solr" : {
"index": "bro",
"batchSize": 5,
"enabled" : false
}
}
... View more
03-10-2017
12:51 AM
Thanks for checking this out!
... View more
03-08-2017
08:35 PM
2 Kudos
For this tutorial we will be using Ubuntu 14.04.5. This setup can further be leveraged with Apache Metron (Incubating). Additional installation instructions for Metron core will be provided in another article. Install Elasticsearch
First we'll install Elasticsearch 2.4. You'll need the following prerequisites:
wget
apt-transport-https
Java
You can install them by logging into your ES node and executing the following:
sudo apt-get update
sudo apt-get install -y wget apt-transport-https
# If using oracle jdk 8
sudo apt-get install -y software-properties-common
sudo apt-add-repository -y ppa:webupd8team/java
sudo apt-get update
echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true" | sudo
debconf-set-selections
sudo apt-get install -y oracle-java8-installer
Now let's install Elasticsearch. Run the following commands on the node where you want to install ES.
# Get the Elasticsearch packages
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
# Add the Elasticsearch packages to apt
echo "deb https://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
# Install Elasticsearch
sudo apt-get update && sudo apt-get install elasticsearch
# Configure for automatic start on system boot for System V systems
sudo update-rc.d elasticsearch defaults 95 10
# Start Elasticsearch
sudo service elasticsearch start
If you're running this in Docker, you'll also want to run the following before starting the es service:
# Setup networking
echo 'network.host: 0.0.0.0' >> /etc/elasticsearch/elasticsearch.yml
Check that Elasticsearch is running. Go to http://$ELASTICSEARCH_HOST:9200 and verify you see something like the following:
{
"name" : "Saturnyne",
"cluster_name" : "metron",
"cluster_uuid" : "F-m2WjlDSAu_0TTCqXki1w",
"version" : {
"number" : "2.4.4",
"build_hash" : "fcbb46dfd45562a9cf00c604b30849a6dec6b017",
"build_timestamp" : "2017-01-03T11:33:16Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},
"tagline" : "You Know, for Search"
}
Install Kibana
Now we'll install Kibana 4.5.3 on Ubuntu 14.04.5. First you should have the following prerequisites:
wget
You can install them by logging into your Kibana node and executing the following:
sudo apt-get update
sudo apt-get install -y wget
Now let's install Kibana. Run the following commands on the node where you want to install Kibana.
# Get the Elasticsearch/Kibana packages
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
# Add the Kibana packages to apt
echo "deb http://packages.elastic.co/kibana/4.5/debian stable main" | sudo tee -a /etc/apt/sources.list
# Install Kibana
sudo apt-get update && sudo apt-get install kibana
# Configure for automatic start on system boot
sudo update-rc.d kibana defaults 95 10
# Configure Kibana for Elasticsearch host:port
# Note: set the host and port accordingly to point to your Elasticsearch host from the installation above.
sed -ri "s;^(\#\s*)?(elasticsearch\.url:).*;\2 'http://elasticsearch:9200';" /opt/kibana/config/kibana.yml
# Start Kibana
export PATH=/opt/kibana/bin:$PATH
kibana
That should be it. Now you should be able to go to http://$KIBANA_HOST:5601 and see the Kibana dashboard. Extras Setting up Docker If you're looking to get a quick demo environment running, you can follow these steps to run this example in Docker. For this part we'll be using Docker for Mac on Mac OSX 10.12. Setup Docker for Mac - https://docs.docker.com/docker-for-mac/ Add Docker files that setup the images for Elasticsearch and Kibana. See the attached ubuntu-dockerfile.tar.gz tarball.ubuntu-dockerfile.tar.gz Untar the bundle. You should have 2 directories: ubuntu-elasticsearch and ubuntu-kibana Build the Docker images (you'll want this so you can reuse them). Replace $YOUR_TAG_NAME with whatever you like. Don't forget the period '.' at the end of the line. docker build -f ubuntu-elasticsearch/Dockerfile -t $YOUR_TAG_NAME/ubuntu-elasticsearch . docker build -f ubuntu-kibana/Dockerfile -t $YOUR_TAG_NAME/ubuntu-kibana . Run the containers. The container names will be "es" and "kibana." docker run -p 9200:9200 -P --name es -dit $YOUR_TAG_NAME/ubuntu-elasticsearch docker run -p 5601:5601 -P --name kibana -dit $YOUR_TAG_NAME/ubuntu-kibana Note: if you need to re-run for any reason (failed startup, for instance), kill and remove the containers, e.g. docker kill es docker rm es Now login to the ES container and follow the ES install steps from above. docker exec -it es /bin/bash logout Login to the Kibana container and follow the Kibana install steps from above. docker exec -it kibana /bin/bash logout You should now have two running Docker containers that you are able to connect to from your localhost. http://localhost:9200 http://localhost:5601 Note: There are currently limitations with Docker for Mac networking. Alternatively, you could use docker-machine for a more robust example.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-repositories.html
https://www.elastic.co/guide/en/kibana/4.5/setup-repositories.html https://docs.docker.com/docker-for-mac/ https://docs.docker.com/machine/overview/
... View more
- Find more articles tagged with:
- CyberSecurity
- ElasticSearch
- How-ToTutorial
- Kibana
- Metron
- ubuntu
Labels:
06-17-2016
05:59 PM
@li zhen There is open work for this being done as well 🙂
... View more
06-09-2016
08:19 PM
Hi @li zhen - Thanks for the interest in Metron! Please bear with us - we recently upgraded Kibana and that search functionality is not currently exposed in the new UI as a result. There is a CLI tool in the works that will expose this as a workaround. Note: if you're not working from the latest master branch, it may be that you just need to fill in all the fields for the search to work. Update 6/17/16 - There is a pending pull request for the CLI tool. You can track it here - https://issues.apache.org/jira/browse/METRON-235 Update 6/22/16 - This work is now complete and committed in master. See the Jira above as well as the following docs: https://cwiki.apache.org/confluence/display/METRON/PCAP+CLI+Tool https://github.com/apache/incubator-metron/blob/master/metron-platform/metron-pcap-backend/README.md
... View more
04-15-2016
07:21 PM
@kumar rm I believe this is a syntax error masked by the "load" error. You're using "max_runs" as both a column alias and relation name max_runs = FOREACH grp_data GENERATE groupas grp,MAX(runs.runs)as max_runs Change one of them and you should be good to go.
... View more
02-19-2016
01:21 AM
7 Kudos
I also found that if you cleanup hadoop directories on the filesystem, you might need to force Ambari to re-install the packages by removing hdp-select. This works when you're going through "Install, Start, Test." Retry the failures after running this on each affected node. yum -y erase hdp-select
... View more
02-09-2016
08:28 PM
1 Kudo
This is a hint for the user and is not currently used or validated by Falcon in any capacity afaik. Be aware that unlike process and feed entities, cluster definitions cannot be updated. This means that when you upgrade HDP your cluster interface versions will be out of date.
... View more
02-03-2016
08:37 PM
5 Kudos
You're missing hdfs-site.xml in the config, which is where the NN HA details are found. Config requires both hdfs-site and core-site, i.e., set "Hadoop Configuration Resources" similar to the following: /etc/hadoop/2.3.4.0-3485/0/hdfs-site.xml,/etc/hadoop/2.3.4.0-3485/0/core-site.xml Reference: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/index.html A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
... View more
02-03-2016
08:26 PM
1 Kudo
It's a bug in Hive - you can disable hive.auto.convert.join or set the memory at a global level via HADOOP_HEAPSIZE, but it does not solve the question of setting the local task memory on a per-job basis.
... View more
01-28-2016
06:47 PM
2 Kudos
I would cross-check the following: process validity start/end dates input start/end dates feed validity start/end dates input path pattern timezone If you want data to be picked up for a particular process instance, they feed must be valid (read this as the feed is expected to be populated) during that time, and the data must be in a directory that matches the expected pattern. Look at your Oozie coordinator actions for details on what HDFS paths are being waited for.
... View more
12-09-2015
06:54 PM
2 Kudos
There are a couple considerations that need to be taken into account when using NN HA with Falcon and Oozie. In all cases, you need to use the Namenode service ID when referring to the Namenode in the cluster xml. This value can be found in hdfs-site.xml in the property dfs.ha.namenodes.[nameservice ID]. For multi-cluster installs, you need to setup all cluster Namenode HA nameservice ID details in all clusters. For example, if you have two clusters, hdfs-site.xml for both cluster one and cluster two will have 2 nameservice IDs. Likewise, for three clusters, all three clusters would have three nameservice IDs. A two-cluster implementation would look similar to the following: <property>
<name>dfs.ha.namenodes.hacluster1</name>
<value>c1nn1,c1nn2</value>
</property>
<property>
<name>dfs.ha.namenodes.hacluster2</name>
<value>c2nn1,c2nn2</value>
</property>
Now, when you setup Falcon, provide both cluster definitions on both clusters.
... View more
11-26-2015
01:09 AM
For those upvoting this answer, this is the correct answer for increasing mem for mapper Yarn containers, but will not work in cases where Hive is optimizing by creating a local task. What happens is that it generates a hash table of values for the map-side join first on a local node, then uploads this to HDFS for distribution to all mappers that need the fast lookup table. It's the local task that is the problem here, and the only way to fix this is to bail on the map-side join optimization, or change your HADOOP_HEAPSIZE on a global level through Ambari. Not elegant, but it is a workaround.
... View more
11-26-2015
01:03 AM
@Guilherme Braccialli, that doesn't increase memory allocation for the local task. It's a percentage threshold before the job is automatically killed. It's already at 90% by default, so at this point the only option is to increase the local mem allocation. I tested the "HADOOP_HEAPSIZE" option from Ambari, and it works, but it's global.
... View more
11-25-2015
06:23 PM
Doesn't seem to work. Did the following: $ export HADOOP_OPTS="-Xmx1024m" $ hive -f test.hql > results.txt ... Starting to launch local task to process map join;maximum memory = 511180800 = 0.5111808GB ...
... View more
11-24-2015
10:27 PM
2 Kudos
Is there a way in HDP >= v2.2.4 to increase the local task memory? I'm aware of disabling/limiting map-only join sizes, but we want to increase, not limit it. Depending on the environment, the memory allocation will shift, but it appears to be entirely to Yarn and Hive's discretion. "Starting to launch local task to process map join;maximum memory = 255328256 => ~ 0.25 GB" I've looked at/tried:
hive.mapred.local.mem hive.mapjoin.localtask.max.memory.usage - this is simply a percentage of the local heap. I want to increase, not limit the mem.
mapreduce.map.memory.mb - only effective for non-local tasks
I found documentation suggesting 'export HADOOP_HEAPSIZE="2048"' to change from the default, but this applied to the nodemanager. Any way to configure this on a per-job basis? EDIT To avoid duplication, the info I'm referencing comes from here: https://support.pivotal.io/hc/en-us/articles/207750748-Unable-to-increase-hive-child-process-max-heap-when-attempting-hash-join Sounds like a per-job solution is not currently available with this bug.
... View more
Labels:
10-28-2015
12:28 AM
1 Kudo
Ok, so it's a 1-to-1 mapping of the DistCP functionality that we currently choose to expose (I added the features for maxMaps and mapBandwidth 🙂 ). Incidentally, in HDP 2.3 the Falcon UI does not have a way to include mirror job parameters. You can do it with the traditional feed definitions.
... View more
10-28-2015
12:15 AM
3 Kudos
First, get rid of the hashtag in your path "#startday," assuming that's not a typo. The folder name examples you're referring to are actually showing sample token replacement patterns. For example, this: <location type="data" path="/user/falcon/retentiondata/startday=${YEAR}-${MONTH}-${DAY}"/> will resolve to something like this: /user/falcon/retentiondata/startday=2015-10-27 for a daily feed that begins on 10/27 and runs. The next day's "instance" (using Falcon terms) would resolve to: /user/falcon/retentiondata/startday=2015-10-28
... View more
10-27-2015
04:58 PM
1 Kudo
Do we have a detailed technical write-up on Falcon mirroring? It uses distcp under the hood, and I can only assume it uses the -update option, but are there any exceptions to how precisely it follows the distcp docs/functionality? I'm mostly concerned with partially-completed jobs that might have tmp files hanging around when the copy kicks off. I have a use case where the user would like to use mirroring to replicate 1..n feeds within a directory instead of setting up fine-grained feed replication, e.g. mirror job 1= - /data/cust/cust1 - /feed-1 - /feed-n mirror job 2= - /data/cust/cust2 - /feed-1 - /feed-n Any info is appreciated.
... View more
Labels:
10-27-2015
03:01 PM
@Anderw Ahn, @Balu I have an additional question/point to Mayank's question about cluster layout. I understand DR as definitely requiring Oozie to be configured in both locations because distcp will run on the destination cluster, and Hive replication will run on the source cluster. Isn't it also valid that a minimal Falcon install could be achieved by *only* setting up Falcon on the primary/source cluster? In this way, you define 2 clusters (primary, backup) and then simply schedule feeds and processes to run on the appropriate cluster. Falcon can schedule the job to run on Oozie either locally or remote. Please confirm. TL;DR - a single Falcon install can control 2 clusters but requires Oozie installed on both clusters.
... View more
09-24-2015
04:58 PM
3 Kudos
DB – MySQL worked great for an install at a large customer. There is some work to swap out the default after Ambari has already been configured. See the following KB article for more details: Moving Oozie to MySQL with Ambari I haven’t setup HA for Oozie, but I believe @dstreever@hortonworks.com was recently working on this. You’ll need Zookeeper for HA. We had over 1000 various bundles/coordinators/workflows running without any noticeable performance impact using default mem settings.
... View more