Member since
07-11-2016
25
Posts
1
Kudos Received
0
Solutions
04-20-2017
06:53 AM
@Wynner Thank you for the answer. And also need one more help. Do you have any documents or reference for Best practice used in NiFi data flow development ?
... View more
04-19-2017
01:39 PM
@Wynner
We want to keep the file there only. I am using NiFi in cluster.
... View more
04-19-2017
10:36 AM
Getting files from FTP, where we can use ListSFTP and then FetchSTP to get file instead of using GetSFTP processor to get. What could be the advantage of having ListSFTP+FetchFTP over GETSFTP?
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
03-09-2017
10:30 AM
@Michael Young
The _all field is not disabled and we are getting the
following response for the query. Query: GET
/movies/_search?pretty {
"size": 10,
"_source": false,
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*drama*" } } } Query Response: {
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0 },
"hits": {
"total": 4,
"max_score": 1,
"hits": [ {
"_index": "movies",
"_type": "movie_intrnl",
"_id": "AVoYRhQexAEXKBamIeYy",
"_score": 1 }, {
"_index": "movies",
"_type": "movie_shows",
"_id": "AVoYRuxxxAEXKBamIeY2",
"_score": 1 }, {
"_index": "movies",
"_type": "movie_shows",
"_id": "AVoYRuxxxAEXKBamIeY4",
"_score": 1 }, {
"_index": "movies",
"_type": "movie_intrnl",
"_id": "AVoYRhQexAEXKBamIeYw",
"_score": 1 } ] } } The high level intent is to identify fields and values from
index matching search - for presence of keyword anywhere in the document and so
the _all field is used.
... View more
03-07-2017
10:40 AM
@Michael Young We are using the default analyzer and tokenizer. The
_settings endpoint for index does not provide the analyzer that is being used. We are using default mappings for fields and we have not
added any new templates. Please find the mappings used for the index movies below: {
"movies": {
"mappings": {
"movie_shows": {
"properties": {
"date": {
"type": "date" },
"genres": {
"type": "text", "fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } },
"id": {
"type": "long" }, "theatre": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } },
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } } } },
"movie_intrnl": {
"properties": {
"director": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } },
"genres": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } },
"id": {
"type": "long" },
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256 } } },
"year": {
"type": "long" } } } } } }
... View more
03-05-2017
11:03 PM
We are using ElasticSearch 5.0.0. Please let us know if there is any regex or any other way to
perform case insensitive search. Please find data in movies index in ElasticSearch in attachment. Please find aggregation query to find fields matching search
string “*drama*” in movies index: GET
/movies/_search?pretty {
"size": 0,
"_source": false,
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*drama*" } },
"aggs": {
"distinct_tables_1": {
"terms": {
"field": "_type" },
"aggs": {
"distinct_col_1": {
"terms": {
"field": "genres.keyword",
"include" : ".*drama.*" } } } },
"distinct_tables_2": {
"terms": {
"field": "_type" },
"aggs": {
"distinct_col_2": {
"terms": {
"field": "director.keyword", "include"
: ".*drama.*" } } } },
"distinct_tables_3": {
"terms": {
"field": "_type" },
"aggs": {
"distinct_col_3": {
"terms": { "field":
"theatre.keyword",
"include" : ".*drama.*" } } } } } } We get the following response: {
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0 },
"hits": {
"total": 4,
"max_score": 0,
"hits": [] },
"aggregations": {
"distinct_tables_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "movie_intrnl",
"doc_count": 2,
"distinct_col_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } }, {
"key": "movie_shows",
"doc_count": 2,
"distinct_col_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } } ] },
"distinct_tables_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "movie_intrnl",
"doc_count": 2,
"distinct_col_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } }, {
"key": "movie_shows",
"doc_count": 2,
"distinct_col_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } } ] },
"distinct_tables_3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "movie_intrnl",
"doc_count": 2,
"distinct_col_3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } }, {
"key": "movie_shows",
"doc_count": 2,
"distinct_col_3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } } ] } } } It can be seen from the response that there are no matching
columns values in response even though there are documents matching search
string “drama”. The search for regex in aggregations appears to be case
sensitive and so no values are returned. We used this alternate query to find words matching Drama to
perform case-insensitive search. However this uses only part word .*rama.*
instead of Drama and it would be better to perform case-insensitive search. GET
/movies/_search?pretty {
"size": 0,
"_source": false,
"query": {
"query_string": {
"analyze_wildcard": true, "query":
"*drama*" } },
"aggs": {
"distinct_tables_1": {
"terms": {
"field": "_type" }, "aggs":
{
"distinct_col_1": {
"terms": {
"field": "genres.keyword",
"include" : ".*rama.*" } } } },
"distinct_tables_2": {
"terms": { "field": "_type" },
"aggs": {
"distinct_col_2": {
"terms": {
"field": "director.keyword",
"include" : ".*rama.*" } } } },
"distinct_tables_3": {
"terms": {
"field": "_type" },
"aggs": {
"distinct_col_3": {
"terms": {
"field": "theatre.keyword",
"include" : ".*rama.*" } } } } } } Response for the query given above: {
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0 },
"hits": {
"total": 4,
"max_score": 0,
"hits": [] },
"aggregations": {
"distinct_tables_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ { "key": "movie_intrnl",
"doc_count": 2,
"distinct_col_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "BiographyDrama",
"doc_count": 1 }, {
"key": "Drama",
"doc_count": 1 } ] } }, {
"key": "movie_shows",
"doc_count": 2,
"distinct_col_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ { "key":
"BiographyDrama",
"doc_count": 1 }, {
"key": "Drama",
"doc_count": 1 } ] } } ] },
"distinct_tables_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "movie_intrnl",
"doc_count": 2,
"distinct_col_2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "Drama1",
"doc_count": 1 }, {
"key": "Drama4",
"doc_count": 1 } ] } }, {
"key": "movie_shows",
"doc_count": 2, "distinct_col_2":
{
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [] } } ] },
"distinct_tables_3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [ {
"key": "movie_intrnl",
"doc_count": 2,
"distinct_col_3": {
"doc_count_error_upper_bound": 0, "sum_other_doc_count":
0,
"buckets": [] } }, {
"key": "movie_shows",
"doc_count": 2,
"distinct_col_3": {
"doc_count_error_upper_bound": 0, "sum_other_doc_count": 0,
"buckets": [ {
"key": "Drama4",
"doc_count": 1 } ] } } ] } } }
... View more
11-22-2016
10:32 AM
Hello All, my requirement is to store multiple images along with some identifier column in Hive table . Is there any way to store multiple images in Hive tables?
... View more
Labels:
- Labels:
-
Apache Hive
09-22-2016
01:03 PM
1 Kudo
Hello Team, There is one query regarding Apache Nifi and Kafka. Both are messaging system. Can someone tell can we replace Nifi with Kafka or vice-versa. And what are advantage of Nifi over Kafka.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
08-16-2016
08:25 AM
@mclark Thanks for the response and appreciated. Do I need to configure something at back-end as well i.e. in nifi.properties or any other file in cluster or node because I am facing attached error.
... View more
08-10-2016
02:11 PM
Thanks @mclark .
I am attaching a template of a flow which extract earthquake data from US government site. But getting duplicate data as output.eqdataus.xml
... View more