In my current cluster, I have some datanodes that have only 2 disks and some datanodes that have 3 disks. I was wondering if it is ok to have a different number of disks, but specify in the datanode configs 3 disks.
Also is it ok if some disks are 2T and some disks are 3T?
With Hadoop 3, there is intra node balance as well as the data nodes balance which can help you distribute and balance the data on your nodes cluster. for sure the recommended way is having all data nodes with same number of disks and size, but its is possible to have different config for data nodes but you will need to keep balancing your data nodes quite often which will take computation and network resources.
Also another thing to consider when you have disks with different size is "data node volume choosing policy" which is by default set to round robin , you need to consider choosing available space instead.
i suggest you to read this article from Cloudera as well.