About sunile_manjee

sunile_manjee · ‎12-12-2016

I ran into this timeout before. I found I had ranger enabled on kafka and the api would time out. can you verify you have ranger disabled on kafka topic, or give correct permission to atlas user.

sunile_manjee · ‎12-12-2016

You can use hbase lily indexer, which will atomically index data from hbase into solr. You can also use apache nifi, ingest once, and fork to solr, hbase, hdfs, and hive. highly flexible implemenation model

sunile_manjee · ‎12-12-2016

@Avijeet Dash Solr competes for resources similar to hbase. It does not run on yarn unless you run it in slider. Controling these resources from CPU and ram perspective becomes challenges. Even if you could isolate ram and CPU, as Terry mentioned, then you sill have IO contention. I do not recommend running solr on hdfs based on my implementation experience. Have it use local fast disk (ie SSD) and let it run.

sunile_manjee · ‎12-11-2016

I highly recommend you use Apache NiFi instead of flume for most if not all data movement into and out of hadoop. For your use case, there is a prebuilt nifi template to push tweets to hive and solr (for searching and trending) https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html If you want further analysis (ie most used words), this can be done several ways. 1. for real time, use spark streaming with nifi. microbatch your counts 2. batch, run hive sql from nifi 3. batch, call hive script from nifi to calculate analysis every x internal 4. batch, setup oozie job to calculate analysis every internal (may be kicked off from nifi as well). so you have options. hope that helps.

sunile_manjee · ‎12-09-2016

can you show use the policy where you provide hive user update permission on table

sunile_manjee · ‎12-09-2016

@ANSARI FAHEEM AHMED can you identifiy which components you need help tuning? hive, pig, mapreduce jobs, etc? Hive performance tuning http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_performance_tuning/bk_performance_tuning-20150721.pdf http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/ Network https://community.hortonworks.com/articles/8563/typical-hdp-cluster-network-configuration-best-pra.html 10 ways to get most out of your cluster http://www.slideshare.net/Hadoop_Summit/radia-srinivas-june261120amroom210c

sunile_manjee · ‎12-08-2016

You can install anaconda but will need to set the correct path during job execution PYSPARK_PYTHON=/opt/anaconda/bin/python spark-submit spark-job.py

sunile_manjee · ‎12-08-2016

@Sampat Budankayala can you add more details on what you are asking? I assume you might be asking on how to move data from Sharepoint to HDFS. I would use apache NiFi and pull data via sharepoint rest api (https://msdn.microsoft.com/en-us/library/office/jj860569.aspx). if it is bulk load option then a java client may be best. this processor does not exist yet but should be very easy https://github.com/OfficeDev/Office-365-SDK-for-Java

sunile_manjee · ‎12-07-2016

@Sami Ahmad can you verify you have run ranger ldap sync.

sunile_manjee · ‎12-06-2016

@hello hadoop I would check here to determine if option is available http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-upgrade/content/upgrading_hdp_stack.html From the docs above the option is available on 2.3. Also please check the pre reqs prior to upgrading http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-upgrade/content/upgrading_HDP_prerequisites.html

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: How to import hive metadata in Apache Atlas ??

Re: HIVE/HBASE/SOLR

Re: HDP Search

Re: Best Way to Transform & Process Data

Re: Ranger Hive row level filter policy prevent IN...

Re: Need Help for HDP Tuning

Re: How Install and set anaconda instead of built-...

Re: Moving data from Sharepoint to HDFS ?

Re: please help understand Ranger security

Re: Rolling upgrade from HDP 2.3.0 to 2.5.3 ?