Member since
10-06-2015
42
Posts
23
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
496 | 11-07-2016 10:11 PM | |
196 | 04-19-2016 05:32 PM | |
382 | 04-11-2016 06:57 PM | |
433 | 11-04-2015 03:06 PM |
03-02-2017
03:05 PM
The hadoop.security.auth_to_local under Advanced core-site.xml in HDFS service can be modified to remove spaces in user names. This translation rule is a modification of the default rule The default rule has a line which removes the domain name
RULE:[1:$1@$0](.*@MYDOMAIN.COM)s/@.*// In order to translate the user name with Spaces "John Doe@MYDOMAIN.COM" to "John_Doe" The following rule can be used RULE:[1:$1](.* .*)s/ /_/g/L
RULE:[1:$1@$0](.*@MYDOMAIN.COM)s/@.*// The first rule replaces the Space with the Underscore while retaining the MYDOMAIN.COM and allows second rule also to be respected. This results in successful replacement of usernames with spaces to underscores.
... View more
- Find more articles tagged with:
- HDFS
- How-ToTutorial
- Kerberos
- Security
Labels:
03-01-2017
06:42 PM
3 Kudos
When synchronizing users from AD or LDAP to Ranger using usersync, It is possible that there are spaces in usernames. These spaces in user names and Group Names may be replaced by SSSD to use underscores. (Example SSSD config section below. [sssd] override_space = _
Ranger needs to synchronize users similarly to avoid issues in access permission setup. To enable Ranger to perform this conversion , the following flags can be added to "Ranger --> Configs --> Advanced --> custom ranger-ug-site" By adding 2 properties ranger.usersync.mapping.groupname.regex, ranger.usersync.mapping.username.regex
These two properties can be set to s/ /_/g After setting these properties restart ranger service using Ambari and then restart Ranger user sync. User sync will add new user names with Underscores in the user name. Older usernames and group names with Spaces will need manual cleanup. Note: This feature is available only From Ranger 0.5.1 version of Ranger. For more details, please refer to https://issues.apache.org/jira/browse/RANGER-684
... View more
- Find more articles tagged with:
- How-ToTutorial
- LDAP
- Ranger
- ranger-usersync
- Security
Labels:
12-15-2016
03:41 PM
2 Kudos
1. WASB --> https://blogs.msdn.microsoft.com/cindygross/2015/02/04/understanding-wasb-and-hadoop-storage-in-azure/ WASB is a storage model which allows storage of data in Blobs within storage accounts/ containers in Azure cloud. 2. DASH --> http://sequenceiq.com/cloudbreak-deployer/latest/azure_pre_prov/ --> This link describes a few scale-out related limits of WASB, and proposes DASH as the solution. DASH is not supported as a storage option, and there are scalability limitations on the number of storage accounts. To quote, "When WASB is used as a Hadoop filesystem the files are full-value blobs in a storage account. It means better performance compared to the data disks and the WASB filesystem can be configured very easily but Azure storage accounts have their own limitations as well. There is a space limitation for TB per storage account (500 TB) as well but the real bottleneck is the total request rate that is only 20000 IOPS where Azure will start to throw errors when trying to do an I/O operation. To bypass those limits Microsoft created a small service called DASH. DASH itself is a service that imitates the API of the Azure Blob Storage API and it can be deployed as a Microsoft Azure Cloud Service. Because its API is the same as the standard blob storage API it can be used almost in the same way as the default WASB filesystem from a Hadoop deployment. DASH works by sharding the storage access across multiple storage accounts. It can be configured to distribute storage account load to at most 15 scaleout storage accounts. It needs one more namespace storage account where it keeps track of where the data is stored. When configuring a WASB filesystem with Hadoop, the only required config entries are the ones where the access details are described. To access a storage account Azure generates an access key that is displayed on the Azure portal or can be queried through the API while the account name is the name of the storage account itself. A DASH service has a similar account name and key, those can be configured in the configuration file while deploying the cloud service." 3. Cloudbreak's allocation model using Multiple storage accounts and Local HDFS. (High performance / Scale out option ) . When allocating HDFS on Azure, Cloudbreak can leverage multiple Storage Accounts and spread data across several storage accounts, This allows data to be sharded across various Storage accounts and helps overcome storage account level limitations on IOPS. This option can be used to scale upto 200 Storage accounts, where as DASH is limited to 15 Scale out Storage accounts.The disk selection can support both premium and storage , based on the VM Type. DS13 or Ds14 VM's are economical for most general purpose use cases, and can support 16 , 1TB storage disks (Standard) storageaccounts.png
... View more
- Find more articles tagged with:
- azure
- Cloud & Operations
- Cloudbreak
- How-ToTutorial
- storagetypes
- wasb
Labels:
12-15-2016
03:11 PM
Prior to Version 1.6.2-rc.8 version of Cloudbreak, provisioning of HDP Clusters on Azure was by Public IP only. This article describes provisioning of a cluster based on Private IP. While creating a Network within Cloudbreak, there are options to 1. Create a New Network and a new subnet 2. Use an existing subnet in an existing virtual network Under the Option 2 "Use an existing subnet in an existing virtual network" There are 2 options added , a) Don't create public IPs b) Don't create new firewall rules Upon selecting these options ( and using Private IP "export PUBLIC_IP=xx.xx.xx.xx" in the Profile file for Cloudbreak, ), Cloudbreak can install clusters using Private IP only in Azure cloud. A screen shot of this feature is attached here --> networkoptions.png
... View more
- Find more articles tagged with:
- azure
- Cloud & Operations
- Cloudbreak
- How-ToTutorial
Labels:
12-13-2016
11:02 PM
1. Install MySQL data Directories on a Non-root partition (Not /var/lib) 2. Create a dedicated least privileged account for Mysql deamon 3. Disable MySQL Command History, Command History may contain passwords which is viewable by other users 4. Disable interactive Login 5. Disable login from nodes other than the those used by hive services 6. Provide only hive user permission to the Hive metadata database within MySQL 7. During installation do not specifify passwords in command line 8. Ensure Data Directories for Mysql has appropriate permissions and ownerships 9. Ensure only DBA administrators have full database access 10. Ensure that database logging is enabled for error logs and log files are maintained on a non system partition 11. Ensure that old passwords is not set to 1 12. Ensure secure_auth is set to ON 13. Consider if your Component can work with the MYSQL "Connect using SSL" option
... View more
12-12-2016
09:17 PM
There are a few unsupported options like https://community.hortonworks.com/repos/39432/nifi-atlas-lineage-reporter.html https://community.hortonworks.com/repos/66014/nifi-atlas-bridge.html I have not tested this myself, But I am interested in any experience summaries on using these bridges
... View more
12-12-2016
09:10 PM
Is this a Kerberized cluster, if so are you using Firefox , IE Or chrome ?
... View more
12-12-2016
09:03 PM
+ https://community.hortonworks.com/questions/11369/is-there-a-way-to-export-ranger-policies-from-clus.html
... View more
12-12-2016
09:02 PM
This is a prior discussion on this topic, I am interested in any new approaches , in case this thread leads to one. https://community.hortonworks.com/questions/37036/how-to-replicate-policies-across-cluster.html
... View more
11-16-2016
02:37 PM
After following Polybase configuration from Microsoft Documentation on Kerberized cluster, we still get the message that Polybase is searching for Simple authentication. Please suggest any additional configuration that may be needed . Microsoft Documentation is at https://msdn.microsoft.com/en-ca/library/mt712797.aspx
... View more
11-07-2016
10:14 PM
Hi, Sometimes there is a delay in picking up group membership using hdfs dfsadmin -refreshUsertoGroupMapping command. Please also confirm that the two users exists in the Server running the namenode
... View more
11-07-2016
10:11 PM
1 Kudo
Hi The Hive principal is not a headless principal , ie the hive principal is dedicated to the HiveServer2 Server . So the Principal name always pooints to the Hiveserver2 , which in your case is qwang-hdp2. So if you are able to login using beeline -u "jdbc:hive2://qwang-hdp2:10000/default;principal=hive/qwang-hdp2@REALM.NAME"
Then you are good.
... View more
11-07-2016
10:06 PM
Nifi can be downloaded as a standalone version (HDF2.0) Or installed with the help of the Nifi Management pack added to Ambari. Within sandbox, This version of Nifi should work fine. (Note , you may need to remove the LZO Codec from a copy of the core-site.xml to allow writing to HDFS with HDF2.0)
... View more
11-07-2016
10:03 PM
Hi, Falcon UI is secured bu enabling authorisation (startup.properties) . https://falcon.apache.org/Security.html
... View more
11-07-2016
09:43 PM
Please also check the link if the schema needs to be inferred in the flow . https://community.hortonworks.com/articles/28341/converting-csv-to-avro-with-apache-nifi.html
... View more
11-07-2016
09:41 PM
Hi, the convertCsvtoAvro Processor has a RecordSchema section. Please try setting this with the double option (outgoing schema option).
... View more
07-07-2016
01:45 PM
The primary design choice to make here is whether we need CPU scheduling (DRF) or not. In some clusters with varying CPU capacities, due to throughput differences we may need to tweak time out settings to increase them. This is because Network socket time out occur in heterogeneous clusters. Another aspect is to ensure that each node has enough memory head room left after Yarn allocation to prevent CPU hangs on less capable nodes. (Typically 80% for Yarn and 20% OS etc) . Since one of the nodes has only 12 GB of RAM, you may also want to closely monitor memory usage of processes especially Ambari agent and Ambari metrics memory usage and monitor if it is growing in size, that Yarn is not aware of.
... View more
07-05-2016
06:29 PM
1 Kudo
Hi, Please check if the sqoop list tables option is working or not against the same connection details. If this is not working, then there is a connectivity issue from the sqoop client machine to the database server. The query that is failing here is not the query used by Sqoop to extract data , but the query used to collect metadata information from remote database. It then uses this information to create the hadoop writers. So this appears to be a connectivity issue, and this can be confirmed by running the list tables variant of sqoop first.
... View more
05-17-2016
08:42 PM
After Ranger plugin is enabled , the Hiveserver2 fails to start with an error indicating , classnotfoundexception org.apache.hadoop.hive.thrift.ZooKeeperTokenStore Switching to DBTokenStore etc does not resolve the issue. This seems to be an Ambari hadoop-env or hive-env issue. Attempts to add HIVE_AUX_JARS etc pointing to the hive-shims-common.jar failed to pick up the correct class. If there is a known resolution please let me know
... View more
Labels:
04-20-2016
07:28 PM
1 Kudo
Please run analuze statistics on the table to confirm that this changes. Also check for the value of hive.stats.autogather - to confirm if this is set to true
... View more
04-19-2016
05:32 PM
1 Kudo
Nifi documentation seems to indicate around 50MB/s -100 MB/s transfer rates. https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.1.0/bk_Overview/content/performance-expectations-and-characteristics-of-nifi.html Nifi is useful if there are several source databases from which data needs to be extracted very frequently as it helps with monitoring and work flow maintenance. If some of this data needs to be routed to different tables based on a columns value for instance Nifi is a good choice as sqoop wont support this by default. If data needs to moved to multiple destinations also Nifi is a good choice - for example , Land data in HDFS while moving a part of the data to Kafka/Storm or Spark - This is also a benefit of Nifi Nifi can apply scheduling of these flows easily while in sqoop it has to be set up as a crontab or Control M etc. Sqoop can use mappers in hadoop for faulttolerance and for parallelism and may achieve better rates.If deduplication etc is to be performed then Nifi becomes a choice for smaller data sizes. For large table loads Sqoop is a good choice.
... View more
04-18-2016
09:24 PM
Please check using FQDN in the Knox gateway URL
... View more
04-18-2016
09:22 PM
Please check the name of the oozie server node. It appears that there is a .out at the end of the servername . is this correct ?
... View more
04-18-2016
05:24 PM
Though it is appealing to attempt a direct conversion of code from Oracle to hive, you may want to check if it is better to re-write the code specifically for Hive. Much of the performance and tuning opportunities will be lost when attempting a direct conversion. Having said that , there are some independent solutions also .. like http://www.ispirer.com/
... View more
04-18-2016
03:59 PM
In the beeline command please check if the Hive principal name is set correctly and matching the cluster settings. Also ensure that the kerberos ticket is still available. !connect jdbc:hive2://sandbox.hortonworks.com:10000/default;principal=hive/_HOST@REALM.COM
... View more
04-18-2016
03:56 PM
1 Kudo
PLease check if setting the sqoop parameter --hive-home parameter explicitly helps.
... View more
04-18-2016
03:51 PM
1 Kudo
https://issues.apache.org/jira/browse/HBASE-7544 More details are at , https://hbase.apache.org/book.html#hbase.encryption.server
... View more
04-13-2016
05:46 PM
2 Kudos
Abstract Sharing a common issue in Ambari upgrade and its solution for Hortonworks Stack. Issue Hive handles the hive.aux.jars.path differenlty after upgrading to Ambari 2.2 (May impact prior versions also) Symptom This causes the errors related to missing hive-hbase-handler.jar and hbase-common.jar as they cannot be found anymore by hiveserver2. Resolution Edit the hive-en.sh from Advanced hive-env / hive-env template section of Ambari configsto add the hbase jars to the HIVE_AUX_JARS_PATH. The output of hbase mapredcp command can be added to the HIVE_AUX_JARS_PATH variable, or saved into a file and sourced within the hive-env.sh template. The output of hbase mapredcp command is as follows. (On HDP 2.3.2 system) /usr/hdp/2.3.2.0-2950/hbase/lib/hbase-common-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-server-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/netty-all-4.0.23.Final.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/metrics-core-2.2.0.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/guava-12.0.1.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-protocol-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-client-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/zookeeper/zookeeper-3.4.6.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar
... View more
Labels:
04-11-2016
06:57 PM
1 Kudo
https://community.hortonworks.com/questions/12663/hdp-install-issues-about-hdp-select.html Please check if this issue is a duplicate of the above link
... View more
04-11-2016
04:01 PM
1 Kudo
Hi, If the tables are large size (Say multi Terabyte) then managing the table ingest through Sqoop / Partitioned Hive Tables is the best option from a performance stand point. Though there are CDC tools like Oracle GoldenGate , which writes to HBASE and handles frequent updates in near-realtime, the maximum number of regions per region server in Hbase will grow very rapidly when there are large tables. The maximum Transactions per second achievable is only around 10000 TPS for the Near-Realtime CDC repliciation processes. In case of a CDC failure for a few days, these new record changes need to be applied and system needs to catch up. Please check the four stage incremental update strategy for Hive for large table updates as documented in the following link. This process merges existing data from the tables to the new/changed data from sources. http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
... View more