Member since
02-09-2016
40
Posts
14
Kudos Received
0
Solutions
05-29-2018
02:16 PM
Thanks @Aditya Sirna for the swift response.
... View more
05-29-2018
01:17 PM
Hi, We are having a HDP 2.5.3 production deployment. However, we are planning to have a separate HDP-Search installation for an independent SolrCloud setup, with no relation to any of the Hadoop components. However, the intention is to have SolrCloud cluster to be setup using Ambari (I know its easy to setup SolrCloud independently, but its an org wide practice to use Ambari where possible). As this is expected to be a purely Solr cluster, wondering whether I can remove all the basic services like HDFS, MapReduce, YARN etc while using Ambari to setup Solr? Is this possible at all or should we be using blueprints to customise our Ambari / Solr setup altogether? Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Solr
06-07-2017
03:57 PM
Hi, I am very new to Nifi and HDF and hence finding it tough to understand the USP of Nifi with respect to other data transport mechanisms. So any help would be grateful. Is NiFi’s primary interaction only through UI? How different is Nifi from Kafka or any enterprise ESB apart from the visual data flow aspect? Especially when comparing with Kafka, what is common between them and where does they differ? My understanding of the Nifi features with respect to Kafka Visual Command and Control - Not available in Kafka Data lineage - Something that can be done with Apache Atlas for Kafka? Data prioritisation - I presume this can be controlled with a combination of topics and consumers and consumer groups in Kafka Back pressure - As Kafka can retain data, consumers can always replay the data and catch-up Control Latency vs Throughput - Similar to back pressure and prioritisation, this can be controlled with consumers and topics with data retention Security - Kafka also has got security implementation Scaling - Build a Kafka cluster
... View more
Labels:
11-02-2016
12:36 AM
Further to my earlier question (https://community.hortonworks.com/questions/64438/hive-beeline-e-in-shell-script.html#comment-64493) Wondering how to use the commands beeline -e and beeline -f in shell scripts (bash shell). When I tried running beeline -e command directly on the bash, it says connection not available. So I presume we need to run beeline -u command or a combination of beeline;!connect commands together. But once we execute either of these commands, we will be in beeline shell rather than bash shell and hence beeline -e command is not needed anymore. So wondering what is the purpose of beeline -e command and how to use it without invoking beeline -u command earlier. I am sure my understanding is wrong somewhere and hence would request to please correct me.
... View more
Labels:
- Labels:
-
Apache Hive
11-01-2016
11:37 PM
Thanks @Neeraj Sabharwal Just wondering if there is any standard approach to this. Without connecting how does someone use hive -e command. i.e. it looks like hive -u and hive -e commands are mutually exclusive to me.
... View more
11-01-2016
04:43 PM
2 Kudos
Hi, Our cluster is secured using Kerberos. Now I need to run hive queries in a shell script which would be scheduled to run periodically. In my shell script, I was thinking to use the below commands in sequence >beeline -u"jdbc:hive2://$hive_server2:10000/$hive_db;principal=$user_principal"
>beeline -e"SHOW DATABASES" But then I realised that once I run the beeline -u command, it would take me to the beeline shell instead of being in the bash shell. So wondering how to get this sorted out. I need to use beeline -e command, but need to connect to the cluster first using kerberos principal. Any ideas whats the best way to handle this? FYI, we are not using Oozie, but shell script with crontab scheduling. Thanks
... View more
Labels:
- Labels:
-
Apache Hive
10-31-2016
05:07 PM
Thanks @Kuldeep Kulkarni
... View more
10-31-2016
03:22 PM
@Kuldeep Kulkarni Thanks for your response. Yes, cluster is integrated to AD and ranger-usersync is enabled. My question is around whether its needed to allow the app-usr to be able to login to master nodes and edge nodes vs just visible from these nodes. For security reasons, we wanted to disallow application users from logging into master nodes and data nodes.
... View more
10-31-2016
02:34 PM
Hi, I have a fundamental query on how permissions work in hadoop. We are setting up a cluster with master nodes, data nodes and edge nodes. Edge nodes are the ones exposed to outside world and all hadoop clients are installed on these machines. External applications stage their data on edge nodes first and then load them into hadoop. We are implementing security to our clusters and thinking to have data ownership and permissions defined through Ranger policies to the app-usr for both HDFS and Hive data. So if a application user app-usr is only given login access to edge nodes (through Active Directory groups), will the user be able to own any data in hadoop? For example, can I have a HDFS directory or Hive table that is owned by app-usr though the user is not available on the master nodes or data nodes but only on edge nodes. Will this allow me to configure Ranger policies for that user? Or should the user be able to login to all the nodes in the cluster? Looking for ideas on the best strategy around this. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Ranger
10-25-2016
11:15 AM
Hi,
Wondering how to retrieve the job id for a job that is submitted through a crontab scheduled to run at regular intervals. For example, if I run a distcp job in my script as below
hadoop distcp hdfs://nn1:8020/src_path hdfs://nn2:8020/dst_path
How to know the YARN job ID so that I can query the status of the job in my script for completion and then take appropriate action.
PS: For various reasons, we are not using Oozie and hence need to do this in script and schedule using crontab.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN