About goga

goga · ‎02-07-2020

You need to add it in custom hdfs site: dfs.namenode.heartbeat.recheck-interval

goga · ‎01-31-2020

Right now Ranger doesn't provide Spark plugin. You can manage access using hdfs permission rwx.

goga · ‎01-30-2020

Hi @Sambavi , You can install any required dependencies on all nodes and use them but you need to keep in mind that Pandas and Numpy doesn't provide distributed computing option and it wouldn't work with big data sets. If your zeppelin configured to use yarn cluster mode It will take all data to spark driver in data node where spark driver located and try to process it there. (if its not big data set you can increase driver resources and it will work but its not looks like solution) if you use client mode it will take everything in zeppelin node. I recommend to try HandySpark https://github.com/dvgodoy/handyspark

goga · ‎01-29-2020

After 10 minutes your node will have dead state and begin replication of data. You can change this parameter if required.

goga · ‎01-28-2020

It depends what you want to change: If you want just to add additional disks in all nodes follow this: Best way to create partitions like /grid/0/hadoop/hdfs/data - /grid/10/hadoop/hdfs/data and mount them to new formatted disks (its one of recommendation parameters for hdfs data mounts but you can change it): /dev/sda1 /grid/0 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdb1 /grid/1 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 /dev/sdc1 /grid/2 ext4 inode_readahead_blks=128,commit=30,data=writeback,noatime,nodiratime,nodev,nobarrier 0 0 After that just add all partitions paths in hdfs configs like: /grid/0/hadoop/hdfs/data,/grid/1/hadoop/hdfs/data,/grid/2/hadoop/hdfs/data But dont delete existed partition from configuration because you will lost data from block which stored in /hadoop/hdfs/data. Path dont really matter just keep them separately and dont forget to make re-balance between disks.

goga · ‎01-28-2020

Its fresh installation or you want to change existed?

goga · ‎01-28-2020

You need to prepare and mount disks before setting this configuration: Datanode directores: /hadoop/hdfs/data/grid/1/ /hadoop/hdfs/data/grid/2/ /hadoop/hdfs/data/grid/3/ /hadoop/hdfs/data/grid/4/ /hadoop/hdfs/data/grid/5/

goga · ‎01-28-2020

Best way is to join your nodes using SSSD service it will solve users directory creation problem + group mapping.

goga · ‎01-28-2020

If you have HDP 3+ and want to use hive metastore you will have problems with versioning between hive and spark. Right now spark available options of hive metastore are 0.12.0 through 2.3.3. You can check updates in this url: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore

goga · ‎01-06-2020

As I know for new releases you need to have commercial subscription to access cloudera repositories.

Online	Offline
Last Visited	‎04-26-2021 06:52 PM

Member Since	‎11-16-2017 10:31 AM
Last Visited	‎04-26-2021 06:52 PM
Posts	28
Kudos received	4

Cloudera Community

Re: How to import Pandas and Numpy in the Livy2.

Re: add grid datanode

Re: Adding User in kerberised Cluster

Re: How can I prepare for Ranger Server failure?

Re: when a datanode is marked as DEAD by namenode,...

Re: Spark-shell Bypassing Ranger for Hive || No Ra...

Re: How to import Pandas and Numpy in the Livy2.

Re: when a datanode is marked as DEAD by namenode,...

Re: add grid datanode

Re: add grid datanode

Re: add grid datanode

Re: Adding User in kerberised Cluster

Re: Steps to install supplementary Spark on HDP cl...

Re: Unable to access Ambari repo