Member since
06-09-2016
185
Posts
22
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2182 | 04-21-2017 07:57 AM | |
1358 | 04-18-2017 07:07 AM | |
3234 | 02-27-2017 05:41 AM | |
905 | 12-09-2016 11:05 AM | |
1270 | 11-24-2016 11:20 AM |
02-13-2017
06:57 AM
Hi, I have a file, which can be used for lookup during a data flow. How do I read the file and put it in the DistributedCache? Thanks, Avijeet
... View more
Labels:
- Labels:
-
Apache NiFi
02-03-2017
08:43 AM
Thanks @Tibor Kiss What is the kind of industry practice when it comes to writing streaming data to both HDFS and another real time store such as HBASE, Cassandra Should we write to HDFS from the stream-processing layer (STORM, SPARK Streaming) OR Should we write it separately using a separate consumer (KAFKA) or SINK (flume) Some reason I think writing from stream processing layer to HDFS doesn't sound right. Thanks, Avijeet
... View more
02-03-2017
05:18 AM
Hi All, I understand SOLR creates a index file and makes searches faster - however I have a fundamental question - Does SOLR stores the data + index - for example if I have a Table with 100 columns, and I want index on a few columns Will SOLR store all the Table data so that it can show the full row on search match OR The full file can be in HDFS/HBASE and SOLR can look it up and show the full row? So can there be an approach where the Data is in HDFS and the primary/secondary indexes in SOLR - and search can find the full data in HDFS. Not only find , can also update / delete. Thanks, Avijeet
... View more
Labels:
- Labels:
-
Apache Solr
02-03-2017
04:36 AM
That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?
... View more
02-02-2017
11:58 AM
Thanks @Tibor Kiss - I am looking for more information around distributed mode, is there a name to the cluster managers in storm or spark stremaing.
... View more
02-02-2017
10:42 AM
Hi All, most of the batch processing frameworks (MR, Spark) support a local mode and a distributed mode (standalone, yarn, mesos) of deployment and execution. what about stream processing frameworks such as STORM, Spark-streaming? Do they manage the distributed mode on their own? is it even realistic to expect them to be work on YARN? How to monitor a distributed spark streaming job? And do we need to specify master as yarn to make it distributed? Thanks, Avijeet
... View more
Labels:
- Labels:
-
Apache Spark
01-27-2017
10:00 AM
Hi, I have been seeing stream processing use cases where as part of streaming ingest along with HBASE, Cassandra etc. HDFS is also shown. Isn't HDFS write was supposedly only with big files 64MB/128MB +. In Flume this is achieved by hdfs.rollSize configurations. So Flume manages the buffer until it becomes big, then it writes/flushes it out. How does this part is taken care when writing from Spark-streaming or STORM? Thanks, Avijeet
... View more
Labels:
01-27-2017
09:08 AM
Thanks @mqureshi - that answers my question. However a number of components have started using HBASE as a meta-data store such as Atlas, Falcon etc. How to see these use cases?
... View more
01-25-2017
05:51 AM
Hi All, HIVE has been established as an analytics engine (SQL query processing) for large file based data. The new features added to HIVE such as ACID, Streaming, updates etc. how does these features fit into the overall HIVE positioning? Is the idea to create a all-in-one DB on HIVE ? Thanks, Avijeet
... View more
Labels:
- Labels:
-
Apache Hive
01-23-2017
06:28 AM
Thanks @mqureshi Can you pls confirm for a cluster deployed without VPC - is there any way to secure Hadoop with all these ports open? Thinking of KNOX as one way - anything else that can be done quickly, also will KNOX work without LDAP/AD? Regards, Avijeet
... View more