Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache Metron Indexing data per location in Elasticsearch (aka dynamic index name)

Highlighted

Apache Metron Indexing data per location in Elasticsearch (aka dynamic index name)

New Contributor

Hi team. I need to be able to store data from two different locations (eg US and UK) from the same sensor (squid) but be able to send the data to two different indices in elasticsearch/solar/hdfs so the data is stored separately.

Since the data we are sending contains a field called location, I was expecting to create an index for Solr, Elastic and HDSF dynamic using something like squid-${location}.

At runtime, depending on the value coming the location field, Metron will index into a separate indice. Note that I don't really want to have a sensor per location so we could reuse as much as possible the parsing logic.

This is a quite standard way in other systems like Logstash where you can create an index like squid-{location}-{country} so the indexing is dynamic.

Is this possible?

3 REPLIES 3

Re: Apache Metron Indexing data per location in Elasticsearch (aka dynamic index name)

New Contributor

Manually setting the source.type field seems to do the job. For your case for example:

"fieldTransformations": [
{
"input": [],
"output": [
"source.type"
],
"transformation": "STELLAR",
"config": {
"source.type": "JOIN(['squid',location,country],'-')"
}
}
]

Re: Apache Metron Indexing data per location in Elasticsearch (aka dynamic index name)

New Contributor

Hi @Bob Van Haute,

That seems to work, however, the elasticsearch and hdfs indices are created wit the default configuration, because the indexing configuration wont match, leading the sensor to use the default configuration (https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.6.1/runbook/content/default_configuration.html)

{
	"hdfs": {
		"batchSize": 10,
		"enabled": true,
		"index": "squid"
	},
	"elasticsearch": {
		"batchSize": 10,
		"enabled": true,
		"index": "squid"
	},
	"solr": {
		"batchSize": 1,
		"enabled": false,
		"index": "squid"
	}
}

Storm also shows a warning because it uses the default (and potential, unoptimized settings) .

Question here Is it possible that above settings matches a wildcard like *squid* ?

java.lang.Exception: WARNING: Default and (likely) unoptimized writer config used for hdfs writer and sensor squid.london at org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:234) at org.apache.storm.daemon.executor$fn__10193$tuple_action_fn__10195.invoke(executor.clj:730) at org.apache.storm.daemon.executor$mk_task_receiver$fn__10114.invoke(executor.clj:462) at org.apache.storm.disruptor$clojure_handler$reify__4137.onEvent(disruptor.clj:40) at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472) at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451) at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) at org.apache.storm.daemon.executor$fn__10193$fn__10206$fn__10259.invoke(executor.clj:849) at org.apache.storm.util$async_loop$fn__1221.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745)				

					

Re: Apache Metron Indexing data per location in Elasticsearch (aka dynamic index name)

New Contributor

I deleted @Bob Van Haute answer by mistake, adding it back here.

Manually setting the source.type field seems to do the job. For your case for example:

"fieldTransformations": [
{
"input": [],
"output": [
"source.type"
],
"transformation": "STELLAR",
"config": {
"source.type": "JOIN(['squid',location,country],'-')"
}
}
]

C

Don't have an account?
Coming from Hortonworks? Activate your account here