About ravi1

divakarreddy_a · ‎07-25-2017

I ran into same issue but it's automatically fixed after re-starting my data node server (re-boot physical linux server).

ravi1 · ‎09-13-2016

Yes. That is the issue. This has been resolved based on that, but I haven't updated the question with this answer. Thanks for answering this.

lmccay · ‎07-11-2016

In theory, the hadoop-auth provider in Knox could be used with KnoxSSO in order to accept the kerberos ticket. It would assume that the kerberos ticket would be presented to Knox via the SPNEGO challenge from hadoop-auth and that the result would be a ticket that for Knox and from the same realm or a trusted realm as Knox is configured for. There are a good number of maybe's in there and it is certainly not something that has been tested. I would be interested in hearing the results. Again, this has not been tested and is not a supported usecase for HDP.

mrizvi · ‎07-08-2016

Hi @Timothy Spann, there is a bug from Ambari perspective, it is not generating hiveserver2-site.xml. So any changes made in Advanced Hiveserver2 site section from Ambari are not getting reflected, (we make changes in hiveserver2-site.xml for Ranger) so if you disable authorization from the general settings as mentioned above, you will be able to run Hive cli but Ranger policies will not work as expected. This issue has been raised up and will be resolved soon in the upcoming releases of Sandbox. For now, you can use Hive but without any Ranger policies.

nidhin_vn · ‎07-06-2016

Thanks Ravi

bpreachuk · ‎07-05-2016

Hi @Johnny Fuger. When you have a set of files in an existing directory structure, and you are not able to move the files around, there is a way to create a Hive table that is partitioned. You can manually define the partitions (explicitly). It is important to note that you are controlling each partition. You create the table, then add each partition manually via an ALTER TABLE command. Here is an example where there are 3 days worth of files in three different directories: directory #1 has 1 file (10 records total), the second directory has 2 files(20 records total), and the 3rd has 3 files(30 records total): hadoop fs -mkdir -p /user/test/data/2016-07-01 hadoop fs -mkdir -p /user/test/data/2016-07-02 hadoop fs -mkdir -p /user/test/data/2016-07-03 hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-01 hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-02/poc_data_file2.txt hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-02 hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03 hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03/poc_data_file2.txt hadoop fs -put /tmp/poc_data_file.txt /user/test/data/2016-07-03/poc_data_file3.txt [root@sandbox hdfs]# hadoop fs -ls -R /user/test/data drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:30 /user/test/data/2016-07-01 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:30 /user/test/data/2016-07-01/poc_data_file.txt drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-02 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-02/poc_data_file.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:31 /user/test/data/2016-07-02/poc_data_file2.txt drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-03 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file2.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file3.txt Now create an external table with a partition clause. Note the rowcount is zero initially since we have not defined any partitions yet. create external table file_data_partitioned (id int, textval string, amount double) partitioned by (dateval string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/user/test/data'; select count(*) from file_data_partitioned; hive> select count(*) from file_data_partitioned; OK 0 Now manually define the 3 partitions on the data using ALTER TABLE commands. You need to specify the correct location for each partition. These partitions could be anywhere in HDFS. ----------------------------------------------- -- Add partitions manually ----------------------------------------------- alter table file_data_partitioned add partition (dateval = '2016-07-01') location '/user/test/data/2016-07-01'; alter table file_data_partitioned add partition (dateval = '2016-07-02') location '/user/test/data/2016-07-02'; alter table file_data_partitioned add partition (dateval = '2016-07-03') location '/user/test/data/2016-07-03'; --------------------------------------- -- Run statistics --------------------------------------- analyze table file_data_partitioned compute statistics ; Now we can see & query the data in each partition. hive> select dateval, count(*) > from file_data_partitioned > group by dateval; OK 2016-07-01 10 2016-07-02 20 2016-07-03 30 Important note though - if you choose this method of manual partitioning, you should always do it the same way each time you add data to the table. Otherwise you will get different directory structures in HDFS for the same table - data will be spread out among the cluster, which can get messy. Here's an example of this when you do an INSERT INTO command to create data for Partition 2017-07-31: insert into file_data_partitioned partition (dateval = '2016-07-31') select id, textval, amount from file_data_partitioned where dateval = '2016-07-01'; [root@sandbox hdfs]# hadoop fs -ls -R /user/test/data drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:30 /user/test/data/2016-07-01 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:30 /user/test/data/2016-07-01/poc_data_file.txt drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-02 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-02/poc_data_file.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:31 /user/test/data/2016-07-02/poc_data_file2.txt drwxr-xr-x - hdfs hdfs 0 2016-07-01 22:32 /user/test/data/2016-07-03 -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file2.txt -rw-r--r-- 1 hdfs hdfs 1024 2016-07-01 22:32 /user/test/data/2016-07-03/poc_data_file3.txt drwxr-xr-x - hdfs hdfs 0 2016-07-05 16:53 /user/test/data/dateval=2016-07-31 -rwxr-xr-x 1 hdfs hdfs 182 2016-07-05 16:53 /user/test/data/dateval=2016-07-31/000000_0 Note the new directory created for 2016-07-31 and see that it has a different structure - the default structure that Hive uses when Hive controls partitioning ( ... /dateval=2016-07-31/ ...) I hope this helps.

kevin_brown · ‎07-01-2016

This will depend on how the forests are setup in AD, but generally you should be able to query the top level domain using the global catalog port (generally 3268 or 3269 instead of the traditional 389). Using the GC port will allow you to follow continuation referrals (referrals that send you from ldap://example.com to ldap://na.example.com) In this case you should be able to use "ldap://EXAMPLE.COM:3268" with a base of "DC=EXAMPLE,DC=COM" which should allow you to return users and groups from all sub domains.

ravi1 · ‎06-23-2016

1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things) 2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari. 3. Its not mandatory to have /user/<username> for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/<username>. Even users that login don't need /user/<username> and could use something like /data/<group>/... to read/write to hdfs.

neeraj1 · ‎01-15-2019

After changing the hostname in the ambari configuration file restart the ambari server.

nicksxs · ‎05-27-2017

if my hive table is a external table located on hdfs, could this solution work? thanks , if my hive table is a external table ,could this solution work?

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Data node down java.net.BindException: Addres...

Re: UndeclaredThrowableException when running mapr...

Re: Ambari SSO

Re: In Sandbox, HDP 2.5, getting a Hive Cli error

Re: 1.WHAT IS CHECKPOINTS IN HADOOP 2.WHERE IT WIL...

Re: Merge multiple directories into one table in H...

Re: How to get subdomain groups for LDAPGroupsMapp...

Re: Part-1 : Authorization on production cluster

Re: Ambari server installation fails at Confirm Ho...

Re: SQOOP Import to Snappy ORC