Member since
04-11-2016
174
Posts
29
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3399 | 06-28-2017 12:24 PM | |
2575 | 06-09-2017 07:20 AM | |
7140 | 08-18-2016 11:39 AM | |
5336 | 08-12-2016 09:05 AM | |
5503 | 08-09-2016 09:24 AM |
09-10-2016
02:03 PM
Vagrant provides a VM in that is run by the provisioner of your choice, for instance, VirtualBox or VMWare. The network configuration of your VM determines whether you can connect to the network outside. Typically, in your example you would use one of two configurations: In a bridged network configuration, the VM has full access to the outside network and can see any machine out there. It also means that your VM is visible as its very own network device from the outside. While this is very convenient, it may be a security issue. And corporate networks may ban you from adding non-approved network devices. In a NAT configuration, traffic is routed through the host machine. In short, this means the VM can see the outside network but the outside network cannot see the VM. You can however expose some of the VM's services using port forwarding. If you want to "bake" your data sets into your Vagrant boxes, this can all be scripted. In order to always get the recent version of the data set, you might want to create a Vagrant box, based on a plain sandbox, that just goes out to the production system and fetches its data as it is spun up the first time. Because the Vagrant box acts as a client using standard APIs, generally speaking I believe you would not have to change your production systems. To give you a precise answers I would need to know your case in more detail, though.
... View more
09-09-2016
09:20 AM
1 Kudo
1) What could be the root cause ? I think it's just the wrong ldapsearch filter, should be ldapsearch -h unix-ldap.company.com -p 389-x -b "dc=company,dc=SE""(&(cn=devdatalakeadm)(memberUid=ojoqcu))" cn=devdatalakeadm,ou=Group,dc=company,dc=se is actually the full dn and you cannot search on it as it's not an attribute. 2) Your problem is still the userDnTemplate, that's why you're still getting the LDAP authentication exception ldapRealm.userDnTemplate = uid={0},cn=devdatalakeadm,ou=Group,dc=company,dc=se Why are you trying to search the user inside the cn=devdatalakeadm subtree? That's not how users and groups are represented in LDAP (unless you did something very specific). Users and Groups are normally in separate trees and membership is only decided by the memberUid parameter in your case. But if memberUid is ojoqcu it doesn't mean uid=ojoqcu,cn=devdatalakeadm,ou=Group,dc=company,dc=se actually exist, ojoqcu user could be in a separate tree/ou, like uid=ojoqcu,ou=User,dc=company,dc=se To further help you finding out the correct userDnTemplate, I'd need an ldapsearch output for a user, just like you showed for groups.
... View more
09-06-2016
05:14 AM
without compression [numFiles=8, numRows=6547431, totalSize=66551787, rawDataSize=3154024078] with zlib [numFiles=8, numRows=6547431, totalSize=44046849, rawDataSize=3154024078] As you can see, the totalSize is less with zlib.
... View more
08-18-2016
11:39 AM
I have either discovered something strange or I lack the understanding of how Sqoop works : Sqoop doc. says that in case of a composite PK, the --split-by column should be specified during sqoop import, however, I proceeded without doing so. Sqoop then picked up one int column belonging to the PK Only in case of few tables(all of them having at least 1.2 billion rows) did I face this mismatch issue I then used --split-by for those tables and also added --validate. Then I got the same no. of rows imported
... View more
08-12-2016
09:05 AM
1 Kudo
Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked : Did an 'su hive' Executed the following command(probably, the -- --schema should be the last arg, Sqoop simply ignores/breaks after that!) sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table DatabaseLog -- --schema ape
... View more
08-09-2016
04:10 PM
You can always remove the files in .Trash as you would any other directory/file. hdfs dfs -rm -r -skipTrash /user/hdfs/.Trash/*
... View more
07-07-2016
04:24 PM
I am not sure what they mean by ORC not being a general purpose format. Anyway, in this case, you are still going through HCatalog (there are HCatalog APIs for MR and Pig). When I said you can transform this data as necessary, I mean things like creating new Partitions, Buckets, Sorting, Bloom filters and even redesigning tables for better access. There will be data duplication with any data transforms if you want to keep raw data as well.
... View more
07-14-2016
07:43 PM
@Kaliyug Antagonist You will typically need to do some configuration on the views to make them work properly. In a secured cluster, you have to specify all of the parameters for connecting to the particular service instead of using the "Local Cluster" configuration drop down. The Ambari Views Documentation contains instructions for configuring all of the various views.
... View more
07-26-2016
08:01 AM
ranger-home-directory-policy.png@Kaliyug Antagonist We've found another neat solution to this, using a resource path of the form: "/user/${id}" Credit to Naveed Hussain, who found it after we moaned a lot about the alternatives. Screenshot attached.
... View more
06-23-2016
04:30 PM
1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things) 2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari. 3. Its not mandatory to have /user/<username> for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/<username>. Even users that login don't need /user/<username> and could use something like /data/<group>/... to read/write to hdfs.
... View more