Member since
07-30-2019
181
Posts
205
Kudos Received
51
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2412 | 10-19-2017 09:11 PM | |
821 | 12-27-2016 06:46 PM | |
557 | 09-01-2016 08:08 PM | |
609 | 08-29-2016 04:40 PM | |
1143 | 08-24-2016 02:26 PM |
08-15-2016
02:18 PM
1 Kudo
@mkataria As of HDP 2.4, Zeppelin is only in Tech Preview and does not support Kerberos. Integration with Kerberos for Zeppelin will be available in the upcoming HDP 2.5 release due out very soon.
... View more
08-15-2016
01:43 PM
@Jason Hue HDFS only allows one write or append to a file at a time. Allowing concurrent writes would mean that the order of bytes in the file would be random. Even if the two appends write to different blocks, how would you determine which block comes first in the file? It's a difficult problem to solve that has not been done in HDFS.
... View more
08-11-2016
08:51 PM
1 Kudo
@Sunile Manjee One way to accomplish this would be to change the permissions on the hive executable to remove read and execute access for group and other: chmod 400 /usr/hdp/current/hive-client/bin/hive
... View more
08-11-2016
08:17 PM
@mohamed sabri marnaoui Are you running the Ambari agent as a non-root user? If so, make sure that your sudoers file is correct per this documentation: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_sudoer_configuration.html
... View more
08-11-2016
08:11 PM
An HDFS rebalance should optimize how the files are distributed across your network. Is there a particular reason why you want to manually determine where the replicas are stored?
... View more
08-11-2016
07:44 PM
1 Kudo
@Sunile Manjee As @SBandaru states, you will need to make sure that proper group membership is maintained for the non-standard users. If you specify the users at cluster creation time, Ambari will take care of this for you. If you create them after the fact, then you will need to verify group membership. You may also need to modify the auth_to_local filters if the non-standard users are in AD/LDAP and you need to map them to local users. Another thing to consider is if you run the Ambari agent as non-root. There are a number of sudo rules that need to be put in place for the ambari user that allow execution of commands as the various service accounts for purposes of starting/stopping the services, installing packages, etc. You'll need to modify the customizable users sudo entry to suit your environment.
... View more
08-11-2016
01:01 PM
@jovan karamacoski If you want to manually force the blocks to replicate to fix under-replication, you can use the method outlined in this article.
... View more
08-10-2016
09:32 PM
1 Kudo
@Saikrishna Tarapareddy The attribute to correlate on needs to be present in the flowfile for the Merge processor to use it. If you are using FetchFile to get the file, you can add an attribute into that processor using the filename or the substring of the file name. Then it will be present in the flowfile for subsequent processors to use.
... View more
07-19-2016
06:28 PM
1 Kudo
@Alvin Jin You should be able to connect to HANA via a JDBC connection using the DBCPConnectionPool Controller object, then use the ExecuteSQL processor to submit queries.
... View more
07-14-2016
07:43 PM
@Kaliyug Antagonist You will typically need to do some configuration on the views to make them work properly. In a secured cluster, you have to specify all of the parameters for connecting to the particular service instead of using the "Local Cluster" configuration drop down. The Ambari Views Documentation contains instructions for configuring all of the various views.
... View more
07-14-2016
07:29 PM
2 Kudos
@Satya KONDAPALLI Fundamentally, Spark is a data processing engine while NiFi is a data movement tool. Spark is intended for doing complex computations on large amounts of data, combining data sets, applying analytical models, etc. Spark Streaming provides micro batch processing of data to bring this processing closer to real time. NiFi is intended to collect data and move it to the place for it to be processed with some certain modifications or computations on the data as it flows to its final destination.
... View more
07-08-2016
04:02 PM
2 Kudos
@Steen Manniche You are correct that Solr/Ranger is only capable of collection level security at this time. This Solr Wiki page describes a couple of options for adding document level security to Solr (e.g. ManifoldCF)
... View more
07-05-2016
05:37 PM
2 Kudos
@Kaliyug Antagonist Hue requires Python 2.6 while RHEL/CentOS 7 uses Python 2.7. This is why Hue is not supported on RHEL/CentOS 7. There are alternatives in Ambari for most of the functionality provided by Hue. Amber includes views for Hive access, HDFS File management, YARN queue management, Pig scripting, and Tez job management. The only functionality of Hue that isn't covered by Ambari Views is the Oozie Workflow creation. This functionality is coming soon in Ambari. Please see the Ambari Views documentation for more information. If you need to use Hue for Oozie workflow management, You can install Hue on a RHEL/CentOS 6 node and configure it to point to your HDP cluster.
... View more
07-01-2016
07:16 PM
@khaja pasha shake The truststore needs to exist on the node where you are running the Falcon commands (e.g. the Falcon server node). You can create the keystore with the keytool command and import the certificate into that node's keystone. Then specify the location on the Falcon server node for the keystore .
... View more
06-28-2016
05:18 PM
@Kaliyug Antagonist HDFS has the ability to use ACLs (here's a link). If you don't have Ranger, then you can use ACLs to provide finer grained authorization than you can with POSIX permissions. However, if using Ranger, there is more flexibility and you have a single place to manage authorization for all of the components (not just HDFS). So, if you're using Ranger, you don't really need you use HDFS ACLs.
... View more
06-28-2016
03:45 PM
@khaja pasha shake In the configuration for the Hive view, you can add the SSL parameters to the authorization section. Here is a screenshot that should help:
... View more
06-28-2016
03:05 PM
4 Kudos
@Timothy Spann The best way to accomplish this is with the GetTwitter processor and the MergeContent processor. GetTwitter will connect up to a Twitter dev account and pull tweets (you can even filter them). Then you can use MergeContent to collect the tweets into manageable pieces based on number of records, size of file, or a timeout value.
... View more
06-28-2016
03:02 PM
1 Kudo
@khaja pasha shake To use SSL for Hiveserver2, you will need to first enable SSL for Hiveserver2. Assuming you've already done that, your JDBC connection string will need to look something like: jdbc:hive2//server.name:10000/mydb;ssl=true;sslTrustStore=/path/to/truststore.jks;trustStorePassword=MyBadPass1
... View more
06-28-2016
02:49 PM
1 Kudo
@Kaliyug Antagonist HDFS security is multi-tiered: Ranger authorization policies are checked first HDFS ACLs implemented outside of Ranger HDFS POSIX permissions (e.g. rwxr-xr-x) So, what you can do for user home directories is to set the POSIX permissions to 700 and make sure the ownership is <username>:hdfs. This will ensure that only the user has access to his/her home directory. You don't need to create a Ranger policy to allow the access for this. You can do the same for the /tmp directory (set permissions to 777). There are some best practices for securing HDFS with Ranger.
... View more
06-24-2016
05:35 PM
@Benjamin R The permissions of the keytabs are not all 440. Only some of them are (hdfs, hbase, ambari-qa, accumulo, and spnego). Those keytabs are used by other services than just the owner for testing connectivity, and other functions on startup. All of the other keytabs are only readable by the service account that owns the file. The group that owns the keytabs is the hadoop group, and should be reserved for service accounts. This will ensure the security of your cluster.
... View more
06-24-2016
05:12 PM
@David Whoitmore Yes, you can install an alternative version of Python. You will need to install it in a non-system location (leave 2.6 in place and put 2.7 in a new home). Many of the HDP components rely on Python and require v2.6 in the standard place on RedHat 6 in order to work properly.
... View more
06-18-2016
05:28 PM
@Bhanu Pittampally The components that are supported by Ranger have plugins that the components use to verify authorization. For example, if Presto wants to read from HDFS, the it will contact the NameNode. The NameNode will use the HDFS Ranger plugin to check the authorization for the presto user on the files trying to be accessed.
... View more
06-17-2016
07:06 PM
@Bhanu Pittampally At present, Presto is not supported with Ranger. You can control the HDFS and Hive services via Ranger to enable Presto to access those resources, but there won't be any control for Presto security mechanisms. Currently, the supported components in Ranger (0.5) are HDFS, YARN, Hive, HBase, Kafka, Storm, Knox, and Solr.
... View more
06-15-2016
03:53 PM
@Matjaz Skerjanec An HTTP 403 error indicates that the user does not have permission to access something on the server (403 = "Forbidden"). This would be returned by the Isilon node in response to your request for a fsck command. Isilon uses its Integrity Check mechanism to ensure filesystem integrity. Remember that HDFS is just an interface that Isilon provides to the OneFS filesystem. Filesystem integrity checks will be handled internally by OneFS and command like "hdfs fsck" become unnecessary.
... View more
06-13-2016
06:02 PM
2 Kudos
@Teddy Brewski This can be done! Install the Knox server on multiple hosts (can be done by going to Hosts -> hostname -> Add Service -> Knox Gateway). Create a config group for Knox and assign nodes to each config group (Knox -> Configs -> Manage Config Groups) Modify the Advanced Topology for each config group (accessed with the drop down at the top of the Configs page) to change the AD configuration as appropriate.
... View more
06-09-2016
10:33 PM
2 Kudos
@Timothy Spann This document details the best practices for Isilon data storage in a Hadoop environment. It's not specific to HDP 2.4, or OneFS 8.0.0.1, but most of the information is still relevant.
... View more
06-09-2016
02:14 PM
2 Kudos
@Johnny Fugers There are several Spark tutorials that use the Sandbox available on the Hortonworks website. You may be interested in the Interacting with Data on HDP Using Apache Zeppelin and Apache Spark tutorial. We also offer online training via Hortonworks University focused on Data Science and Spark.
... View more
06-09-2016
02:08 PM
1 Kudo
@Johnny Fugers Have you considered using NiFi to load the data? You can read from many different sources, merge the content into large enough portions to optimize the HDFS use, and write the data directly into HDFS.
... View more
06-08-2016
03:09 PM
2 Kudos
@Pardeep Gorla Absolutely! You can use an MIT KDC to provide Kerberos authentication. There are a couple of ways to do this. FreeIPA is a good tool that combines LDAP and KDC management for RedHat (CentOS) systems. This will give you the ability to also manage user sync for Ambari and Ranger with the OpenLDAP managed by FreeIPA. You will need to use the "Manually Manage Kerberos Principals" option when enabling Kerberos on the cluster for now. FreeIPA integration is on the roadmap for Ambari, but is not available yet as of Ambari 2.2.2.
... View more
06-08-2016
02:55 PM
3 Kudos
@mkataria The article you referenced does contain some good information about security exploits for the Microsoft Windows Active Directory KDC. Some of them require you to obtain certain keys or privileges in order to compromise security, some of them require access to the domain controller. This article is a bit dated as it is from a couple of years ago, and investigating some of the bugs mentioned shows that Microsoft has patched some of these holes. Other attacks can be secured against by understanding the attack and eliminating the access required to utilize the exploit. As with any computer system, the key is securing the systems. Keep users off of systems that they shouldn't have access to. If there's a memory exploit on a server, don't let users login to that server. If getting access to a file would compromise security, don't allow access to that file. The implications of being able to arbitrarily generate Kerberos tickets can have impacts in a Hadoop environment just as they would in any network. If a user can obtain a ticket to use HDFS, for example, that user may be able to access data that s/he shouldn't access. This is why security is such an important and complex topic. Ensuring that the various systems are secure individually AND together is key to ensuring the security of your information. To address the specific issues mentioned in this article, and to ensure the utmost security of your systems, I would recommend contacting Microsoft about them, determining which issues are applicable to your particular O/S version, and work with Microsoft on the best way to secure the domain controller against these attacks.
... View more