About bigdata_superno

bigdata_superno · ‎05-29-2018

Thanks @Aditya Sirna for the swift response.

bigdata_superno · ‎05-29-2018

Hi, We are having a HDP 2.5.3 production deployment. However, we are planning to have a separate HDP-Search installation for an independent SolrCloud setup, with no relation to any of the Hadoop components. However, the intention is to have SolrCloud cluster to be setup using Ambari (I know its easy to setup SolrCloud independently, but its an org wide practice to use Ambari where possible). As this is expected to be a purely Solr cluster, wondering whether I can remove all the basic services like HDFS, MapReduce, YARN etc while using Ambari to setup Solr? Is this possible at all or should we be using blueprints to customise our Ambari / Solr setup altogether? Thanks

bigdata_superno · ‎06-07-2017

Hi, I am very new to Nifi and HDF and hence finding it tough to understand the USP of Nifi with respect to other data transport mechanisms. So any help would be grateful. Is NiFi’s primary interaction only through UI? How different is Nifi from Kafka or any enterprise ESB apart from the visual data flow aspect? Especially when comparing with Kafka, what is common between them and where does they differ? My understanding of the Nifi features with respect to Kafka Visual Command and Control - Not available in Kafka Data lineage - Something that can be done with Apache Atlas for Kafka? Data prioritisation - I presume this can be controlled with a combination of topics and consumers and consumer groups in Kafka Back pressure - As Kafka can retain data, consumers can always replay the data and catch-up Control Latency vs Throughput - Similar to back pressure and prioritisation, this can be controlled with consumers and topics with data retention Security - Kafka also has got security implementation Scaling - Build a Kafka cluster

bigdata_superno · ‎11-02-2016

Further to my earlier question (https://community.hortonworks.com/questions/64438/hive-beeline-e-in-shell-script.html#comment-64493) Wondering how to use the commands beeline -e and beeline -f in shell scripts (bash shell). When I tried running beeline -e command directly on the bash, it says connection not available. So I presume we need to run beeline -u command or a combination of beeline;!connect commands together. But once we execute either of these commands, we will be in beeline shell rather than bash shell and hence beeline -e command is not needed anymore. So wondering what is the purpose of beeline -e command and how to use it without invoking beeline -u command earlier. I am sure my understanding is wrong somewhere and hence would request to please correct me.

bigdata_superno · ‎11-01-2016

Thanks @Neeraj Sabharwal Just wondering if there is any standard approach to this. Without connecting how does someone use hive -e command. i.e. it looks like hive -u and hive -e commands are mutually exclusive to me.

bigdata_superno · ‎11-01-2016

Hi, Our cluster is secured using Kerberos. Now I need to run hive queries in a shell script which would be scheduled to run periodically. In my shell script, I was thinking to use the below commands in sequence >beeline -u"jdbc:hive2://$hive_server2:10000/$hive_db;principal=$user_principal" >beeline -e"SHOW DATABASES" But then I realised that once I run the beeline -u command, it would take me to the beeline shell instead of being in the bash shell. So wondering how to get this sorted out. I need to use beeline -e command, but need to connect to the cluster first using kerberos principal. Any ideas whats the best way to handle this? FYI, we are not using Oozie, but shell script with crontab scheduling. Thanks

bigdata_superno · ‎10-31-2016

Thanks @Kuldeep Kulkarni

bigdata_superno · ‎10-31-2016

@Kuldeep Kulkarni Thanks for your response. Yes, cluster is integrated to AD and ranger-usersync is enabled. My question is around whether its needed to allow the app-usr to be able to login to master nodes and edge nodes vs just visible from these nodes. For security reasons, we wanted to disallow application users from logging into master nodes and data nodes.

bigdata_superno · ‎10-31-2016

Hi, I have a fundamental query on how permissions work in hadoop. We are setting up a cluster with master nodes, data nodes and edge nodes. Edge nodes are the ones exposed to outside world and all hadoop clients are installed on these machines. External applications stage their data on edge nodes first and then load them into hadoop. We are implementing security to our clusters and thinking to have data ownership and permissions defined through Ranger policies to the app-usr for both HDFS and Hive data. So if a application user app-usr is only given login access to edge nodes (through Active Directory groups), will the user be able to own any data in hadoop? For example, can I have a HDFS directory or Hive table that is owned by app-usr though the user is not available on the master nodes or data nodes but only on edge nodes. Will this allow me to configure Ranger policies for that user? Or should the user be able to login to all the nodes in the cluster? Looking for ideas on the best strategy around this. Thanks

bigdata_superno · ‎10-25-2016

Hi, Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action. PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.

Online	Offline
Last Visited	‎10-31-2018 10:41 AM

Member Since	‎02-09-2016 10:04 PM
Last Visited	‎10-31-2018 10:41 AM
Posts	40
Kudos received	14

Cloudera Community

Re: Minimum Services required for Ambari based dep...

Minimum Services required for Ambari based deploym...

Nifi vs Kafka and ESB

Query on Hive Commands / Beeline

Re: Hive beeline -e in shell script

Hive beeline -e in shell script

Re: Permissions to the data stored in Hadoop

Re: Permissions to the data stored in Hadoop

Permissions to the data stored in Hadoop

Job ID for a scheduled job