Member since
11-12-2019
2
Posts
0
Kudos Received
0
Solutions
10-09-2022
07:22 PM
spark .session() .read() .option("encoding", "UTF-8") .option("delimiter", "^") .option("mode", "PERMISSIVE") .schema(SCHEMA_STORE.getIPDRschema()) .csv( JavaConverters.collectionAsScalaIterableConverter(_files_to_process) .asScala() .toSeq()) .withColumn("filename", org.apache.spark.sql.functions.input_file_name()) .dropDuplicates(); Written in java please convert it into scala hope this will work 🙂
... View more
11-19-2019
06:27 AM
1 Kudo
Infuriatingly, the connector defaults to only returning 1000 rows. This doesn't seem to be documented (anywhere I've found). The relevant configuration is exec.results.max, which can be passed in by setting spark.datasource.hive.warehouse.exec.results.max in spark shell. Add the following config to increases the maximum to 20000: --conf "spark.datasource.hive.warehouse.exec.results.max=20000" @russell786 @gopi_gogada
... View more