About yogesh_sharma05

yogesh_sharma05 · ‎04-20-2017

@Wynner Thank you for the answer. And also need one more help. Do you have any documents or reference for Best practice used in NiFi data flow development ?

yogesh_sharma05 · ‎04-19-2017

@Wynner We want to keep the file there only. I am using NiFi in cluster.

yogesh_sharma05 · ‎04-19-2017

Getting files from FTP, where we can use ListSFTP and then FetchSTP to get file instead of using GetSFTP processor to get. What could be the advantage of having ListSFTP+FetchFTP over GETSFTP?

yogesh_sharma05 · ‎03-09-2017

@Michael Young The _all field is not disabled and we are getting the following response for the query. Query: GET /movies/_search?pretty { "size": 10, "_source": false, "query": { "query_string": { "analyze_wildcard": true, "query": "*drama*" } } } Query Response: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 1, "hits": [ { "_index": "movies", "_type": "movie_intrnl", "_id": "AVoYRhQexAEXKBamIeYy", "_score": 1 }, { "_index": "movies", "_type": "movie_shows", "_id": "AVoYRuxxxAEXKBamIeY2", "_score": 1 }, { "_index": "movies", "_type": "movie_shows", "_id": "AVoYRuxxxAEXKBamIeY4", "_score": 1 }, { "_index": "movies", "_type": "movie_intrnl", "_id": "AVoYRhQexAEXKBamIeYw", "_score": 1 } ] } } The high level intent is to identify fields and values from index matching search - for presence of keyword anywhere in the document and so the _all field is used.

yogesh_sharma05 · ‎03-07-2017

@Michael Young We are using the default analyzer and tokenizer. The _settings endpoint for index does not provide the analyzer that is being used. We are using default mappings for fields and we have not added any new templates. Please find the mappings used for the index movies below: { "movies": { "mappings": { "movie_shows": { "properties": { "date": { "type": "date" }, "genres": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "id": { "type": "long" }, "theatre": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "movie_intrnl": { "properties": { "director": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "genres": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "id": { "type": "long" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "year": { "type": "long" } } } } } }

yogesh_sharma05 · ‎03-05-2017

We are using ElasticSearch 5.0.0. Please let us know if there is any regex or any other way to perform case insensitive search. Please find data in movies index in ElasticSearch in attachment. Please find aggregation query to find fields matching search string “*drama*” in movies index: GET /movies/_search?pretty { "size": 0, "_source": false, "query": { "query_string": { "analyze_wildcard": true, "query": "*drama*" } }, "aggs": { "distinct_tables_1": { "terms": { "field": "_type" }, "aggs": { "distinct_col_1": { "terms": { "field": "genres.keyword", "include" : ".*drama.*" } } } }, "distinct_tables_2": { "terms": { "field": "_type" }, "aggs": { "distinct_col_2": { "terms": { "field": "director.keyword", "include" : ".*drama.*" } } } }, "distinct_tables_3": { "terms": { "field": "_type" }, "aggs": { "distinct_col_3": { "terms": { "field": "theatre.keyword", "include" : ".*drama.*" } } } } } } We get the following response: { "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "distinct_tables_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } ] }, "distinct_tables_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } ] }, "distinct_tables_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } ] } } } It can be seen from the response that there are no matching columns values in response even though there are documents matching search string “drama”. The search for regex in aggregations appears to be case sensitive and so no values are returned. We used this alternate query to find words matching Drama to perform case-insensitive search. However this uses only part word .*rama.* instead of Drama and it would be better to perform case-insensitive search. GET /movies/_search?pretty { "size": 0, "_source": false, "query": { "query_string": { "analyze_wildcard": true, "query": "*drama*" } }, "aggs": { "distinct_tables_1": { "terms": { "field": "_type" }, "aggs": { "distinct_col_1": { "terms": { "field": "genres.keyword", "include" : ".*rama.*" } } } }, "distinct_tables_2": { "terms": { "field": "_type" }, "aggs": { "distinct_col_2": { "terms": { "field": "director.keyword", "include" : ".*rama.*" } } } }, "distinct_tables_3": { "terms": { "field": "_type" }, "aggs": { "distinct_col_3": { "terms": { "field": "theatre.keyword", "include" : ".*rama.*" } } } } } } Response for the query given above: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "distinct_tables_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "BiographyDrama", "doc_count": 1 }, { "key": "Drama", "doc_count": 1 } ] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_1": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "BiographyDrama", "doc_count": 1 }, { "key": "Drama", "doc_count": 1 } ] } } ] }, "distinct_tables_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Drama1", "doc_count": 1 }, { "key": "Drama4", "doc_count": 1 } ] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } ] }, "distinct_tables_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "movie_intrnl", "doc_count": 2, "distinct_col_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } }, { "key": "movie_shows", "doc_count": 2, "distinct_col_3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Drama4", "doc_count": 1 } ] } } ] } } }

yogesh_sharma05 · ‎11-22-2016

Hello All, my requirement is to store multiple images along with some identifier column in Hive table . Is there any way to store multiple images in Hive tables?

yogesh_sharma05 · ‎09-22-2016

Hello Team, There is one query regarding Apache Nifi and Kafka. Both are messaging system. Can someone tell can we replace Nifi with Kafka or vice-versa. And what are advantage of Nifi over Kafka.

yogesh_sharma05 · ‎08-16-2016

@mclark Thanks for the response and appreciated. Do I need to configure something at back-end as well i.e. in nifi.properties or any other file in cluster or node because I am facing attached error.

yogesh_sharma05 · ‎08-10-2016

Thanks @mclark . I am attaching a template of a flow which extract earthquake data from US government site. But getting duplicate data as output.eqdataus.xml

Online	Offline
Last Visited	‎05-24-2017 01:30 PM

Member Since	‎07-11-2016 06:49 AM
Last Visited	‎05-24-2017 01:30 PM
Posts	25
Kudos received	1

Cloudera Community

Re: ListSFTP+FetchFTP vs GETSFTP ??

Re: ListSFTP+FetchFTP vs GETSFTP ??

ListSFTP+FetchFTP vs GETSFTP ??

Re: ElasticSearch query to perform case-insensitiv...

Re: ElasticSearch query to perform case-insensitiv...

ElasticSearch query to perform case-insensitive se...

Is there any way to store multiple images in Hive ...

Apache Nifi and Kafka

Re: How to get only unique data from flow files

Re: How to get only unique data from flow files