Member since
Kudos Received
My Accepted Solutions
Title | Views | Posted |
2081 | 06-21-2017 03:53 PM | |
3160 | 03-14-2017 01:24 PM | |
1988 | 01-25-2017 03:36 PM | |
3166 | 12-20-2016 06:19 PM | |
1589 | 12-14-2016 05:24 PM |
03:41 AM
I ran into same issue but it's automatically fixed after re-starting my data node server (re-boot physical linux server).
... View more
05:17 PM
Yes. That is the issue. This has been resolved based on that, but I haven't updated the question with this answer. Thanks for answering this.
... View more
08:47 PM
In theory, the hadoop-auth provider in Knox could be used with KnoxSSO in order to accept the kerberos ticket. It would assume that the kerberos ticket would be presented to Knox via the SPNEGO challenge from hadoop-auth and that the result would be a ticket that for Knox and from the same realm or a trusted realm as Knox is configured for. There are a good number of maybe's in there and it is certainly not something that has been tested. I would be interested in hearing the results. Again, this has not been tested and is not a supported usecase for HDP.
... View more
07:21 PM
1 Kudo
Hi @Timothy Spann, there is a bug from Ambari perspective, it is not generating hiveserver2-site.xml. So any changes made in Advanced Hiveserver2 site section from Ambari are not getting reflected, (we make changes in hiveserver2-site.xml for Ranger) so if you disable authorization from the general settings as mentioned above, you will be able to run Hive cli but Ranger policies will not work as expected. This issue has been raised up and will be resolved soon in the upcoming releases of Sandbox. For now, you can use Hive but without any Ranger policies.
... View more
03:13 PM
Thanks Ravi
... View more
05:32 PM
1 Kudo
Hi @Johnny Fuger. When you have a set of files in an existing directory structure, and you are not able to move the files around, there is a way to create a Hive table that is partitioned. You can manually define the partitions (explicitly). It is important to note that you are controlling each partition. You create the table, then add each partition manually via an ALTER TABLE command. Here is an example where there are 3 days worth of files in three different directories: directory #1 has 1 file (10 records total), the second directory has 2 files(20 records total), and the 3rd has 3 files(30 records total): hadoop fs -mkdir -p /user/test/data/2016-07-01
hadoop fs -mkdir -p /user/test/data/2016-07-02
hadoop fs -mkdir -p /user/test/data/2016-07-03
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-01
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-02/poc_data_file2.txt
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-02
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03/poc_data_file2.txt
hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03/poc_data_file3.txt
[root@sandbox hdfs]# hadoop fs -ls -R /user/test/data
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:30 /user/test/data/2016-07-01
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:30 /user/test/data/2016-07-01/poc_data_file.txt
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-02
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-02/poc_data_file.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:31 /user/test/data/2016-07-02/poc_data_file2.txt
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-03
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file2.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file3.txt
Now create an external table with a partition clause. Note the rowcount is zero initially since we have not defined any partitions yet. create external table file_data_partitioned (id int, textval string, amount double)
partitioned by (dateval string)
LOCATION '/user/test/data';
select count(*) from file_data_partitioned;
hive> select count(*) from file_data_partitioned;
0 Now manually define the 3 partitions on the data using ALTER TABLE commands. You need to specify the correct location for each partition. These partitions could be anywhere in HDFS. -----------------------------------------------
-- Add partitions manually
alter table file_data_partitioned add partition (dateval = '2016-07-01')
location '/user/test/data/2016-07-01';
alter table file_data_partitioned add partition (dateval = '2016-07-02')
location '/user/test/data/2016-07-02';
alter table file_data_partitioned add partition (dateval = '2016-07-03')
location '/user/test/data/2016-07-03';
-- Run statistics
analyze table file_data_partitioned compute statistics ; Now we can see & query the data in each partition. hive> select dateval, count(*)
> from file_data_partitioned
> group by dateval;
2016-07-01 10
2016-07-02 20
2016-07-03 30 Important note though - if you choose this method of manual partitioning, you should always do it the same way each time you add data to the table. Otherwise you will get different directory structures in HDFS for the same table - data will be spread out among the cluster, which can get messy. Here's an example of this when you do an INSERT INTO command to create data for Partition 2017-07-31: insert into file_data_partitioned partition (dateval = '2016-07-31')
select id, textval, amount
from file_data_partitioned
where dateval = '2016-07-01';
[root@sandbox hdfs]# hadoop fs -ls -R /user/test/data
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:30 /user/test/data/2016-07-01
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:30 /user/test/data/2016-07-01/poc_data_file.txt
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-02
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-02/poc_data_file.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:31 /user/test/data/2016-07-02/poc_data_file2.txt
drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-03
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file2.txt
-rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file3.txt
drwxr-xr-x - hdfs hdfs 0 2016-07-05 16:53 /user/test/data/dateval=2016-07-31
-rwxr-xr-x 1 hdfs hdfs 182 2016-07-05 16:53 /user/test/data/dateval=2016-07-31/000000_0
Note the new directory created for 2016-07-31 and see that it has a different structure - the default structure that Hive uses when Hive controls partitioning ( ... /dateval=2016-07-31/ ...) I hope this helps.
... View more
07:02 PM
1 Kudo
This will depend on how the forests are setup in AD, but generally you should be able to query the top level domain using the global catalog port (generally 3268 or 3269 instead of the traditional 389). Using the GC port will allow you to follow continuation referrals (referrals that send you from ldap:// to ldap:// In this case you should be able to use "ldap://EXAMPLE.COM:3268" with a base of "DC=EXAMPLE,DC=COM" which should allow you to return users and groups from all sub domains.
... View more
04:30 PM
1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things) 2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari. 3. Its not mandatory to have /user/<username> for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/<username>. Even users that login don't need /user/<username> and could use something like /data/<group>/... to read/write to hdfs.
... View more
07:20 AM
After changing the hostname in the ambari configuration file restart the ambari server.
... View more
09:28 AM
if my hive table is a external table located on hdfs, could this solution work? thanks , if my hive table is a external table ,could this solution work?
... View more