Member since
11-16-2017
28
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1440 | 01-30-2020 11:15 PM | |
1390 | 01-28-2020 11:52 PM | |
1198 | 01-28-2020 03:39 AM | |
1214 | 02-27-2018 03:02 PM |
02-07-2020
08:34 AM
You need to add it in custom hdfs site: dfs.namenode.heartbeat.recheck-interval
... View more
01-31-2020
08:08 AM
Right now Ranger doesn't provide Spark plugin. You can manage access using hdfs permission rwx.
... View more
01-31-2020
12:02 AM
Hi, You can set housekeeping for small period of time and turn on audit to hdfs (if its off). In this way you will have option to quick check recent logs from Ranger UI and store history in hdfs which you can check any time.
... View more
01-30-2020
11:15 PM
Hi @Sambavi , You can install any required dependencies on all nodes and use them but you need to keep in mind that Pandas and Numpy doesn't provide distributed computing option and it wouldn't work with big data sets. If your zeppelin configured to use yarn cluster mode It will take all data to spark driver in data node where spark driver located and try to process it there. (if its not big data set you can increase driver resources and it will work but its not looks like solution) if you use client mode it will take everything in zeppelin node. I recommend to try HandySpark https://github.com/dvgodoy/handyspark
... View more
01-29-2020
10:00 AM
After 10 minutes your node will have dead state and begin replication of data. You can change this parameter if required.
... View more
01-28-2020
11:52 PM
It depends what you want to change: If you want just to add additional disks in all nodes follow this: Best way to create partitions like /grid/0/hadoop/hdfs/data - /grid/10/hadoop/hdfs/data and mount them to new formatted disks (its one of recommendation parameters for hdfs data mounts but you can change it): /dev/sda1 /grid/0 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdb1 /grid/1 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdc1 /grid/2 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 After that just add all partitions paths in hdfs configs like: /grid/0/hadoop/hdfs/data,/grid/1/hadoop/hdfs/data,/grid/2/hadoop/hdfs/data But dont delete existed partition from configuration because you will lost data from block which stored in /hadoop/hdfs/data. Path dont really matter just keep them separately and dont forget to make re-balance between disks.
... View more
01-28-2020
04:09 AM
Hi check this article: https://community.cloudera.com/t5/Support-Questions/why-kafka-should-be-un-even-number/td-p/199062
... View more
01-28-2020
04:03 AM
1 Kudo
I will recommend to check hortonworks project in github: https://github.com/hortonworks-spark/spark-atlas-connector
... View more
01-28-2020
04:00 AM
You need to prepare and mount disks before setting this configuration: Datanode directores: /hadoop/hdfs/data/grid/1/ /hadoop/hdfs/data/grid/2/ /hadoop/hdfs/data/grid/3/ /hadoop/hdfs/data/grid/4/ /hadoop/hdfs/data/grid/5/
... View more
01-28-2020
03:39 AM
1 Kudo
Best way is to join your nodes using SSSD service it will solve users directory creation problem + group mapping.
... View more
01-28-2020
03:21 AM
If you have HDP 3+ and want to use hive metastore you will have problems with versioning between hive and spark. Right now spark available options of hive metastore are 0.12.0 through 2.3.3. You can check updates in this url: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
... View more
01-06-2020
02:50 AM
As I know for new releases you need to have commercial subscription to access cloudera repositories.
... View more
01-29-2019
12:18 PM
I suggest you to follow this manual: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_hadoop-high-availability/content/configure_ranger_admin_ha.html from 31 point and check all spn entries in spnego.service.keytab.
... View more
01-15-2019
02:03 PM
It seems that you dont have valid user in SAM. If you have kerberized cluster I will recommend to use SAM keytab to validate your login from browser and then create Administrator user.
... View more
01-15-2019
01:54 PM
Hi, when you enabling kerberos on cluster with Ranger HA enabled you need to: 1. Create service principals for load balancer nodes in AD or other Directory Manager (LB1,LB2,VIP); 2. Generate keytabs for this principals; 3. Merge generated keytabs with spnego.service.keytab; You need to use hostnames for principals and LB VIP hostname for Ranger url
... View more
10-29-2018
12:22 PM
In HDP and HDF you have option to Kerberized all components + integrate with Active Directory or other directory managers like (freeIPA or openldap) for authorization. Authentication and auditing part goes to apache Ranger. If required encruption Ranger KMS and Apache Knox for perimiter security.
... View more
10-28-2018
09:20 PM
Can you provide more information: 1. From which OS you are trying to connect 2. In hive server configuration which transport protocol you have (binary or http) 3. Your cluster is kerberized or not
... View more
10-28-2018
11:17 AM
Hi it depends of available resources of the nodes where you are planing to install all services. If your master nodes have enough resources you can have (HDFS,YARN,Zookeeper,HBase) on same nodes. In case of Kafka and NiFi I will recommend to install on separate machines. If you have 3 master nodes: 1. HDFS(Active),YARN(Active),HBase(Active),Zookeeper 2. HDFS(Standby),YARN(Standby),HBase(Standby),Zookeeper 3. Zookeeper
... View more
10-28-2018
11:13 AM
Hi you can try for Hiveserver2: jdbc:hive2://<zk1>:2181,<zk2>:2181,<zk3>:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver
... View more
03-21-2018
12:54 PM
Hi it doesn't metter where you will have Tableu Desktop installed. In any way you will publish project from Desktop to Server.
... View more
03-21-2018
12:23 PM
1 Kudo
In general i wouldn't recommend to turn on ACID transaction because it has a lot of open issue. We had problems with lock manager, compactors and others.
... View more
03-21-2018
12:19 PM
If its windows machine you need to install: 1. Spark ODBC. 2. MIT kerberos client. 3. Configure MIT client (you need to add environment variables for KRB5_CONFIG -> path to krb5.conf file, KRB5CCNAME -> path to cache file). Then you will be able to connect kerberizec cluster
... View more
03-09-2018
01:47 PM
1 Kudo
If you have some virtualization with fault tolerance option and shared storage (like VMware esxi, etc.) I will recommend you to install Ambari Server there.
... View more
03-09-2018
01:42 PM
In any way its better to backup logs
... View more
03-09-2018
01:39 PM
For Log Search default retention period is 7 days and for Ranger 90 days. I will recommend you to store it for default period of time + store all logs in hdfs. If you will delete old log files it wouldn't harm the system.
... View more
02-27-2018
03:02 PM
Hi, I will recommend you to install two HAproxy as a load balancer and configure keepalived (VRRP) between them. In this way you will have virtual ip to which you will refer ranger url. But keep in mind that ranger use relational database which also must have master-master replication if its mysql it can be Galera cluster. So you will have two HAproxy with virtual ip + two Ranger Admin instances + Galera Cluster for mysql. This configuration solves single point of fail in all steps.
... View more