Yes. You can think of select() as the "filter" of columns where filter() filters rows. You want to reduce the impact of the shuffle as much as possible. Perform both of these as soon as possible. The groupBy() is going to cause a shuffle by key (most likely). Be careful with the groupBy(). If you can accomplish what you need to do with a reduceBy(), you should use that instead.
If you mean dataframe instead of dataset, SparkSQL will handle much of this optimization for you. But if using normal RDDs, you are going to have to deal with these types of optimizations on your own.