Member since
11-16-2017
28
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2448 | 01-30-2020 11:15 PM | |
2817 | 01-28-2020 11:52 PM | |
2562 | 01-28-2020 03:39 AM | |
2235 | 02-27-2018 03:02 PM |
02-07-2020
08:34 AM
You need to add it in custom hdfs site: dfs.namenode.heartbeat.recheck-interval
... View more
01-31-2020
08:08 AM
Right now Ranger doesn't provide Spark plugin. You can manage access using hdfs permission rwx.
... View more
01-30-2020
11:15 PM
Hi @Sambavi , You can install any required dependencies on all nodes and use them but you need to keep in mind that Pandas and Numpy doesn't provide distributed computing option and it wouldn't work with big data sets. If your zeppelin configured to use yarn cluster mode It will take all data to spark driver in data node where spark driver located and try to process it there. (if its not big data set you can increase driver resources and it will work but its not looks like solution) if you use client mode it will take everything in zeppelin node. I recommend to try HandySpark https://github.com/dvgodoy/handyspark
... View more
01-29-2020
10:00 AM
After 10 minutes your node will have dead state and begin replication of data. You can change this parameter if required.
... View more
01-28-2020
11:52 PM
It depends what you want to change: If you want just to add additional disks in all nodes follow this: Best way to create partitions like /grid/0/hadoop/hdfs/data - /grid/10/hadoop/hdfs/data and mount them to new formatted disks (its one of recommendation parameters for hdfs data mounts but you can change it): /dev/sda1 /grid/0 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdb1 /grid/1 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdc1 /grid/2 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 After that just add all partitions paths in hdfs configs like: /grid/0/hadoop/hdfs/data,/grid/1/hadoop/hdfs/data,/grid/2/hadoop/hdfs/data But dont delete existed partition from configuration because you will lost data from block which stored in /hadoop/hdfs/data. Path dont really matter just keep them separately and dont forget to make re-balance between disks.
... View more
01-28-2020
04:00 AM
You need to prepare and mount disks before setting this configuration: Datanode directores: /hadoop/hdfs/data/grid/1/ /hadoop/hdfs/data/grid/2/ /hadoop/hdfs/data/grid/3/ /hadoop/hdfs/data/grid/4/ /hadoop/hdfs/data/grid/5/
... View more
01-28-2020
03:39 AM
1 Kudo
Best way is to join your nodes using SSSD service it will solve users directory creation problem + group mapping.
... View more
01-28-2020
03:21 AM
If you have HDP 3+ and want to use hive metastore you will have problems with versioning between hive and spark. Right now spark available options of hive metastore are 0.12.0 through 2.3.3. You can check updates in this url: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
... View more
01-06-2020
02:50 AM
As I know for new releases you need to have commercial subscription to access cloudera repositories.
... View more