About sunile_manjee

sunile_manjee · ‎07-12-2016

@pankaj chaturvedi inside your pig script do this: set exectype=tez;

sunile_manjee · ‎07-11-2016

@mark doutre difficult to provide recommendations without knowing the use case. One thing that stands out to me with ASR is the source data generating app will have to adhere to a schema. This ends up provide just another maintance issue on the app side. For me I rather have the data flow into NiFi and have it forks based on the type of data feed in.

sunile_manjee · ‎07-11-2016

with hive cli I am able to execute a init script when I launch hive. For example in my .hiverc file I have add jar statement.s When I launch hive it automatically executes all statements in the script. how to do this with beeline?

sunile_manjee · ‎07-11-2016

@Sri Bandaru add your jars to .hiverc file.

sunile_manjee · ‎07-11-2016

@Benjamin Leonhardi did this work as expected? if we simply select one of the RMs in falcon, will it fail over automatically to secondary? I am trying to understand the impact if Falcon pointing to RM and that RM goes down.

sunile_manjee · ‎07-11-2016

Client, ApplicationMaster and NodeManager on RM failover When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the “new” Active. This default retry logic is implemented as org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can override the logic by implementing org.apache.hadoop.yarn.client.RMFailoverProxyProvider and setting the value of yarn.client.failover-proxy-provider to the class name.

sunile_manjee · ‎07-11-2016

@Ahmad Debbas I have done this using storm to parse emails/pdfs using tika as documents land onto hdfs. You can use storm hdfs spout (info here). Once data is parsed, using another bolt to sink into solr. Pretty straight forward solution. NiFi is definitely a consideration. You will need a build a NiFi tiki processor. As each event is then run through processor --> parsed text--> into solr. this could work as well

sunile_manjee · ‎07-11-2016

@SANTOSH DASH You can process data in hadoop using many difference services. If your data has a schema then you can start with processing the data with hive. Full tutorial here. My preference is to do ELT logic with pig. Full tutorial here. there are many ways to skin a cat here. Full list of tutorials are here.

sunile_manjee · ‎07-11-2016

@Kiran Jilla Are there any difference in pre-prod vs prod? db versions, services, kerberos, etc?

sunile_manjee · ‎07-09-2016

Duplicate question answered here https://community.hortonworks.com/questions/44208/hdfs-heterogeneous-storage-using-aws-s3-as-storage.html

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Setting execution type as tez inside pig scrip...

Re: Avro Schema Registry

beeline is there -i option?

Re: Adding hive auxiliary jar files

Re: Falcon with HA Resource Manager

Re: Falcon support on Hadoop HA services

Re: Solr indexing

Re: How to process large volume of data(e.g, 100 G...

Re: Stack HDP-2.4 doesn't have upgrade packages

Re: S3 as HDFS Heterogeneous storage?