Member since
08-17-2017
36
Posts
1
Kudos Received
0
Solutions
08-15-2018
02:36 AM
Hi All, Please provide some input: HDFS disk usage is high, hence thought of changing the replicator factor for particular existing directory in HDFS from 3 to 1. I know it can been done using setrep command hdfs
dfs -setrep -R -w 1 /directory for the data which is already present Questions: I don't want to change the replication factor in core-site.xml , hence executing the above command for existing data holds good but for new data that comes in to those directories is there any way to keep the replication value persistent? i know there is a JIRA opened :https://issues.apache.org/jira/browse/HDFS-199 but do we have a workaround Are there any implications by doing this apart from data loss ( in our case data gets backed up in DR ) Will changing the replication factor cause any under replication of blocks or any other block issue? Any other recommendations apart from the above. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
07-19-2018
02:23 PM
@sseth I have downloaded the latest jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar Tried creating the external table and its failing with following error: FAILED: HiveAccessControlException Permission denied: user [abcd] does not have [READ] privilege on [gs://hdp-opt1/forhive/languages] (state=42000,code=40000) I have enabled Hive plugin and set the permission of 777 in coresite.xml Where there any changes made to jar?? I also see few properties have changed in this link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/gcp-cluster-config.html Is it mandatory to use the json key? If my vm instance has required permission to talk to gcs
... View more
07-11-2018
07:41 PM
@Jonathan Sneep I have added the active directory certificate to ambari-truststore but still cluster creation fails with below error: Ambari operation start failed: [component:'INSTALL_START', requestID: '1', context: 'Logical Request: Provision Cluster 'cbd-master1'
FAILED: Failed to create the account for HTTP/cbdmaster1-m-0-20180711174907.c.ibxev-edl2-61a040fc.internal@CLUSTER.COM', status: 'ABORTED'] In Manual Installation we have check if KDC/Kerberos credentials are working fine by checking the connection. How can we do the same in cloudbreak?
... View more
07-10-2018
03:56 PM
@mmolnar @Jonathan Sneep I did follow your recommendations. Thanks for the input. Without enabling Kerberos authentication cluster setup is successful. But enabling the Kerberos security setup fails with following error: Ambari operation start failed: [component:'INSTALL_START', requestID: '1', context: 'Logical Request: Provision Cluster 'cbd-master' FAILED: unable to find valid certification path to requested target', status: 'ABORTED'] Please find my Kerberos details. Could it be failing because of any of below parameters. Kerberos Type: Existing KDC with active directory Kerberos URL: 35.237.19.153 Kerberos Admin URL: 35.237.19.153 Kerberos Realm: CLUSTER.COM Kerberos AD Ldap URL: ldaps://35.237.19.153:636 Kerberos AD Container DN: ou=hdpusers,dc=cluster,dc=com Use TCP connection: Yes
... View more
07-09-2018
11:56 PM
@mmolnar I am using GCP platform and trying to configure Datalake service by enabling Kerberos. External Authentication test succeeded. But when creating a cluster by enabling Kerberos and adding required KDC credentials it failing with the error attached. Below are my Kerberos Details: KDC host: hdp-ad.cluster.com ( 35.237.19.153 ). I am not why kerberosization is failing. ldap-error.txt
... View more
07-06-2018
04:13 AM
@Jonathan Sneep Thanks for the input, tried the cluster setup but it failed with the following error: Cluster installation failed to complete, please check the Ambari UI for more details. You can try to reinstall the cluster with a different blueprint or fix the failures in Ambari and sync the cluster with Cloudbreak later. I was able to fix the errors in ambari, I have 2 questions hope you would be able to answer 1. How do i sync ambari with cloudbreak 2. Post cluster creation failure, cloudbreak UI doesn't load at all.
... View more
07-05-2018
03:07 AM
HI All, Please help, Cluster creation in cloudbreak via datalake is failing but not sure what's the issue. I have attached the log file I think its something to do with my LDAP configuration. But authentication configuration test connecrion was successful. Might be issue with user and group configuration.
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
07-03-2018
04:54 PM
@pooja kamle, Were you able to setup single ranger instance for emphermal clusters
... View more
06-08-2018
01:33 PM
Hi @sseth Issue is resolved after adding following property in core-site.xml fs.gs.reported.permissions=777 Normal Users can access hive and create external table pointing to GCS location.
... View more
05-31-2018
01:54 PM
@Geoffrey Shelton Okot I was able to create hive external table pointing my storage as GCS. But it only works as hive superuser but doesn't work as a normal hive user meaning, hdpuser1 cannot create hive table it fails with above error, but if execute su - hive it works . I am no sure how to rectify this.
... View more
05-29-2018
08:19 PM
@Geoffrey Shelton Okot I did try, but still fails. CommandException: hdpuser1:WRITE is not a valid ACL changehdpuser1 is not a valid scope type The GCS bucket has storage admin rights given to service account hadoop fs -ls gs://bucket/ = works fine
... View more
05-29-2018
01:05 PM
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1) hdpuser1 is an AD user, using the same user I execute $ hdfs dfs -ls gs://bucket/ but using beeline when I try to create an external table it fails
... View more
05-28-2018
08:25 PM
Hi All, In the cloud world, can I have mutilple HDP VM instances use a single ranger DB, wherein my Ranger DB is sitting outside the HDP VM instance and i am migrating my existing ranger policies to MYSQL DB
... View more
Labels:
- Labels:
-
Apache Ranger
05-28-2018
02:44 PM
Hi Neeraj, We are trying to test GCS connector with HDP 2.6.5 ( https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/authentication-gcp.html
) by having GCS as my storage. When trying to create hive external table it’s failing with the
following error: Error: Error while
processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:java.security.AccessControlException: Permission denied:
user=hdpuser1, path="gs://hive_metastore/":hive:hive:drwx------)
(state=08S01,code=1) Syntax: CREATE EXTERNAL TABLE test1256(name string,id
int) LOCATION 'gs://hive_metastore/' ;
... View more
05-28-2018
02:27 PM
Hi, Thanks a lot for the info, but still facing the same issue. I did create the user in AD and have a valid ticket , hdfs command does work accessing the GCS but cannot create external hive table.
... View more
05-25-2018
01:09 PM
1 Kudo
I have installed a hadoop 2.6.5 version cluster in GCP using VM's instances. Used GCP connector and pointed by hdfs to use gs bucket. Added the below 2 entries in coresite.xml : google.cloud.auth.service.account.json.keyfile=<Path-to-the-JSON-file>
fs.gs.working.dir=/
When using hadoop gs -ls / works fine , but when I am creating a hive tables CREATE EXTERNAL TABLE test1256(name string,id int) LOCATION 'gs://bucket/';
I get the following error:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hdpuser1, path="gs://bucket/":hive:hive:drwx------) (state=08S01,code=1) Apart form changes to coresite.xml are there any changes to be made at hive.xml also? https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/authentication-gcp.html
... View more
Labels:
- Labels:
-
Apache Hadoop
05-24-2018
01:33 PM
Hi Vipin, Thanks for the feedback. Can we build a custom plugin https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207 for GCS. Ranger did have a plugin a WASB in azure in the future do we have one for GCS as well?
... View more
05-22-2018
05:01 PM
Hi All, Please Advice; I have installed HDP 2.6.5 in GCP and have used the gcs connector to point my HDFS to GCS. With my current onprem setup ranger controls access to HDFS and Hive via policies. In the cloud do we have a ranger plugin that can talk to GCS instead of HDFS and control access on it. Similarly my Hive Metastore is placed outside the HDP cluster do we have a ranger plugin to talk to cloudsql. Thanks
... View more
Labels:
- Labels:
-
Apache Ranger
05-16-2018
05:21 PM
Per the doc in Apache, only after Hive version 3.0 the
function of NOT NULL constraints could be supported (https://issues.apache.org/jira/browse/HIVE-16575). -bash-4.2$ hive --version Hive 1.2.1000.2.6.4.0-91 Since our Hive version is 1.2, please help to confirm
whether the NOT NULL function is supported for specific columns in HIVE table.
If supported, how to implement such function in the table DDL. Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
11-07-2017
08:58 PM
Hi All, I wanted to know what parameters needs to be changed we have cluster with nodes of different characteristics in RAM, Eg: We have 143 datanodes with 256 GB RAM planning to upgrade few datanodes to 512 GB RAM, so how can i make sure YARN utilizes this change. Currently yarn is allocated 168 GB. I researched and found this https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.2.0/bk_ambari-operations/content/using_host_config_groups.html which says to add config groups Is there any detailed document for this, as what settings needs changes.
... View more
Labels:
- Labels:
-
Apache Hadoop
10-01-2017
03:06 PM
Hi, thanks for the reply, but i don't see snapshot for the below mentioned user id -bash-4.2$ hdfs dfs -du -h /user/alin/
1.3 T /user/alin/.Trash 40.5 M /user/alin/.hiveJars -bash-4.2$ hdfs dfs -du -h /user/alin/.snapshot du: `/user/alin/.snapshot': No such file or directory
-bash-4.2$ hdfs dfs -ls /user/alin/.snapshot
ls: `/user/alin/.snapshot': No such file or directory I have extracted fsimage and using ELK to see the storage used by hdfs, the space consumption of certain directories don;t match with the actual size on the cluster. Eg: /user/alin is consuming 1 TB of storage according hdfs dfs du command but in fsimage it shows that same user is consuming 40.5 M of storage.... on performing hdfs ls command on /user/alin i dont see any files of 1 TB As checked with my operations they say because of hdfs snapshot saved on the cluster it gives those values as blocks are still allocated..but i don't see any doc in hortonworks mentioning that... If thats the case how do i calculate the actual storage used by HDFS? does fsimage give accurate data how to make sure fsimage also get snapshot details ?
... View more
09-28-2017
03:00 PM
Hi All, Need some help. When hdfs du -s h / command is executed i see the below user has 12 T of trash but when checking the list of files its empty. please find the below: bash-4.2$
hadoop fs -ls /user/mzhou1/.Trash -bash-4.2$
hadoop fs -du -s /user/mzhou1/.Trash 12248499878795
/user/mzhou1/.Trash -bash-4.2$
hadoop fs -ls /user/mzhou1/.Trash -bash-4.2$ It's not only with this user there are other directories also facing similar issue.. is it because of the snapshot stored in hdfs wherein blocks are allocated. Please provide inputs
... View more
Labels:
- Labels:
-
Apache Hadoop
09-26-2017
06:29 PM
Hi Klien, Thanks for the info..can you let me know how is that calculated. I have been reading forums but i couldn't understand. Eg: I have application A with 4576 Gb of data i was able to calculate the space on hdfs which is 1804 Gb and requires 1.12 data nodes as data node we have are of 23 TB.. I want to know how does it impact the existing setup?? as u mentioned above how will DFS, remaining, and non DFS change.
... View more
09-25-2017
08:12 PM
Hi, Thanks for the reply I know that 300 GB of data will be used by HDFS..but my question is with 300 GB of data how will it impact the existing setup?? In ambari i can see above details how will that change and with more and more application coming in, how to assess the impact
... View more
09-25-2017
07:41 PM
I have to calculate what amount of space will new applications use on the existing cluster and how this will impact the
overall utilized capacity within the cluster. The reasoning is that the
intake process should address the impact this will have on disk utilization
within the cluster. In ambari have the following details: DFS used 2.3 PB 72.19 % Non DFS used 2.6 Tb (0.08 %) Remaining 918.4 TB ( 27.73 % lets i have a application A which has 100 Gb of data which is going to use hdp... How much will it consume on the hdfs
... View more
Labels:
09-19-2017
05:53 PM
Hi Can anyone please let me know how to proceed with the request?? I can extract ranger policies via the following command: curl -iv -u username:passwd -H "Content-type:application/json" -X GET http://sandbox.com:6080/service/public/v2/api/policy I need the automate the script such that it there is no need to enter the username and passwd... is it possible ?? is there any API from which i can test it....
... View more
09-18-2017
06:23 PM
I have the same command: curl -iv -u admin:admin -H "Content-type:application/json" -X GET http://sandbox.com:6080/service/public/v2/api/policy My question: Is it possible to automate the above command such that it doesn't ask for ranger admin user name and password and it authenticates via kerboros authentication
... View more
09-18-2017
06:10 PM
Can not automate the ranger admin curl command wherein it doesn't ask for username and passwd for ranger credentials
... View more
09-18-2017
05:39 PM
Hi Team, I use the following command to extract ranger policies: curl -iv -u admin:admin -H "Content-type:application/json" -X GET http://sandbox.com:6080/service/public/api/policy > /tmp Is there anyway wherein I can automate the above script such that there is no need to key-in admin username & passwd I am planning to schedule a cron job such that every 6 hrs it should extract policy and dump the o/p in a path. Thanks
... View more
Labels:
- Labels:
-
Apache Ranger
09-14-2017
04:04 PM
We have lots of applications who wants to use Hadoop. I am doing a capacity planning for the data they bring in I need recommendations for data nodes, task per node and memory required in hdp environment to process the data lets say i have 10 TB of data and this for one year, so how do i calculate the above things Hardware i have is: 48 CPU cores, 251.6 GB of RAM and 23 TB of disk space in data nodes and there are 50 data nodes
... View more
Labels: