Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark.sql.parquet.cacheMetadata is not work for spark hivethriftserver

Highlighted

spark.sql.parquet.cacheMetadata is not work for spark hivethriftserver

Explorer
we used SparkSQL/ sparkApplication or other ETL tools to generate the data of hiveTable in parquet format, also we are running a spark hivethriftserver to query those tables , sometimes we find that we can't query some tables ,the error logs likes "Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00186-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00187-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00188-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00189-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00190-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00191-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00192-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00193-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00194-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00195-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00196-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet Input path does not exist: hdfs://bigdatacluster1/hiveweb/recommend.db/user2brandidtable/part-r-00197-a165f573-8454-4df1-b37b-a8cdd414f2de.gz.parquet at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.parquet.hadoop.ParquetInputFormat.listStatus(ParquetInputFormat.java:339) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1$$anon$4.listStatus(ParquetRelation.scala:358) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:294) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1.getPartitions(ParquetRelation.scala:363) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2087) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1499) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1504) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$collect$1.apply(DataFrame.scala:1504) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2100) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1504) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1481) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExe cuteStatementOperation.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)" so we changed spark.sql.parquet.cacheMetadata to false and restarted spark hivethriftserver, but not working except for executing “refresh table xxx” manually Is there someone know the issue?