<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to change location of avro.schema.url location in Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-change-location-of-avro-schema-url-location-in-Hive/m-p/160576#M29178</link>
    <description>&lt;P&gt;I have Hive external tables using Avro serdes that look like this:&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE `rm`(
  `gw_id` bigint COMMENT '', 
  `rm_id` string COMMENT '', 
  `start_time` bigint COMMENT '', 
  `end_time` bigint COMMENT '', 
  `switch_failure_count` bigint COMMENT '')
PARTITIONED BY ( 
  `year` int, 
  `month` int, 
  `day` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
WITH SERDEPROPERTIES ( 
  'avro.schema.url'='hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://mpws:8020/jup1_stats/external_tables/rm'
TBLPROPERTIES (
  'last_modified_by'='mpetronic', 
  'last_modified_time'='1463833730', 
  'transient_lastDdlTime'='1463833730')
&lt;/PRE&gt;&lt;P&gt;The Avro schemas are stored on HDFS in /jup1_stats/schema/. I want to change that path from this:&lt;/P&gt;&lt;PRE&gt;/jup1_stats/schema/&lt;/PRE&gt;&lt;P&gt;to this:&lt;/P&gt;&lt;PRE&gt;/jup1_stats/avro/&lt;/PRE&gt;&lt;P&gt;So, I copied my avro schema files from /jup1_stats/schema to /jup1_stats/avro and did this:&lt;/P&gt;&lt;PRE&gt;alter table rm set serdeproperties ('avro.schema.url'='hdfs://mpws:8020/stats/avro/rm_1.avsc');&lt;/PRE&gt;&lt;P&gt;I ran a test query and it worked. But not so fast! Then I did the clean up and removed the /jup1_stats/schema directory and ran the test query again and got tons of stack traces and the query failed.&lt;/P&gt;&lt;PRE&gt;ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1463688329951_0007_4_00, diagnostics=[Task failed, taskId=task_1463688329951_0007_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.&amp;lt;init&amp;gt;(TezGroupedSplitsInputFormat.java:135)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
    at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
    at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
    at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
    at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
    at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
    at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
    ... 14 more
Caused by: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
    ... 25 more&lt;/PRE&gt;&lt;P&gt;It appears that Hive is still looking for the schema in the old location based on this error trace:&lt;/P&gt;&lt;PRE&gt;Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc&lt;/PRE&gt;&lt;P&gt;I also tried this but the queries still fail:&lt;/P&gt;&lt;PRE&gt;msck repair table rm;&lt;/PRE&gt;&lt;P&gt;If I recreate the schema files in the old location (/jup1_stats/schema), the query then works again. &lt;/P&gt;&lt;P&gt;These are external tables, and I did try dropping the table and recreating it with the new location in avro.schema.url and that works. But, I would like to understand why I cannot use "alter table" to achieve this same result.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Sat, 21 May 2016 19:46:08 GMT</pubDate>
    <dc:creator>Mark_Petronic</dc:creator>
    <dc:date>2016-05-21T19:46:08Z</dc:date>
    <item>
      <title>How to change location of avro.schema.url location in Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-change-location-of-avro-schema-url-location-in-Hive/m-p/160576#M29178</link>
      <description>&lt;P&gt;I have Hive external tables using Avro serdes that look like this:&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE `rm`(
  `gw_id` bigint COMMENT '', 
  `rm_id` string COMMENT '', 
  `start_time` bigint COMMENT '', 
  `end_time` bigint COMMENT '', 
  `switch_failure_count` bigint COMMENT '')
PARTITIONED BY ( 
  `year` int, 
  `month` int, 
  `day` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
WITH SERDEPROPERTIES ( 
  'avro.schema.url'='hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://mpws:8020/jup1_stats/external_tables/rm'
TBLPROPERTIES (
  'last_modified_by'='mpetronic', 
  'last_modified_time'='1463833730', 
  'transient_lastDdlTime'='1463833730')
&lt;/PRE&gt;&lt;P&gt;The Avro schemas are stored on HDFS in /jup1_stats/schema/. I want to change that path from this:&lt;/P&gt;&lt;PRE&gt;/jup1_stats/schema/&lt;/PRE&gt;&lt;P&gt;to this:&lt;/P&gt;&lt;PRE&gt;/jup1_stats/avro/&lt;/PRE&gt;&lt;P&gt;So, I copied my avro schema files from /jup1_stats/schema to /jup1_stats/avro and did this:&lt;/P&gt;&lt;PRE&gt;alter table rm set serdeproperties ('avro.schema.url'='hdfs://mpws:8020/stats/avro/rm_1.avsc');&lt;/PRE&gt;&lt;P&gt;I ran a test query and it worked. But not so fast! Then I did the clean up and removed the /jup1_stats/schema directory and ran the test query again and got tons of stack traces and the query failed.&lt;/P&gt;&lt;PRE&gt;ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1463688329951_0007_4_00, diagnostics=[Task failed, taskId=task_1463688329951_0007_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.&amp;lt;init&amp;gt;(TezGroupedSplitsInputFormat.java:135)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
    at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
    at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
    at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
    at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
    at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
    at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
    ... 14 more
Caused by: java.io.IOException: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
    at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
    ... 25 more&lt;/PRE&gt;&lt;P&gt;It appears that Hive is still looking for the schema in the old location based on this error trace:&lt;/P&gt;&lt;PRE&gt;Unable to read schema from given path: hdfs://mpws:8020/jup1_stats/schema/rm_1.avsc&lt;/PRE&gt;&lt;P&gt;I also tried this but the queries still fail:&lt;/P&gt;&lt;PRE&gt;msck repair table rm;&lt;/PRE&gt;&lt;P&gt;If I recreate the schema files in the old location (/jup1_stats/schema), the query then works again. &lt;/P&gt;&lt;P&gt;These are external tables, and I did try dropping the table and recreating it with the new location in avro.schema.url and that works. But, I would like to understand why I cannot use "alter table" to achieve this same result.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sat, 21 May 2016 19:46:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-change-location-of-avro-schema-url-location-in-Hive/m-p/160576#M29178</guid>
      <dc:creator>Mark_Petronic</dc:creator>
      <dc:date>2016-05-21T19:46:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to change location of avro.schema.url location in Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-change-location-of-avro-schema-url-location-in-Hive/m-p/160577#M29179</link>
      <description>&lt;P&gt;I stumbled onto my own answer. Appears that you have to provide a partition spec to the alter command. I figured this out after poking around in mysql to look at my hive metastore to see if that would give me a clue. These queries are what made me think that the serde information is on a partition-by-partition basis:&lt;/P&gt;&lt;PRE&gt;mysql&amp;gt; select * from TBLS where TBL_NAME='rm'\G
*************************** 1. row ***************************
            TBL_ID: 170
       CREATE_TIME: 1463833647
             DB_ID: 11
  LAST_ACCESS_TIME: 0
             OWNER: mpetronic
         RETENTION: 0
             SD_ID: 227
          TBL_NAME: rm
          TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
    LINK_TARGET_ID: NULL
1 row in set (0.00 sec)

mysql&amp;gt; select * from SDS where CD_ID=170\G
*************************** 1. row ***************************
                    SD_ID: 227
                    CD_ID: 170
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
                 SERDE_ID: 227
*************************** 2. row ***************************
                    SD_ID: 228
                    CD_ID: 170
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm/year=2016/month=5/day=10
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
                 SERDE_ID: 228
*************************** 3. row ***************************
                    SD_ID: 229
                    CD_ID: 170
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm/year=2016/month=5/day=11
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
                 SERDE_ID: 229

&lt;/PRE&gt;&lt;P&gt;Once I saw all those 'LOCATION' values in the SDS table, I tried the following command to alter the table and then the query worked. Interesting. Does this imply that you could have a different schema for each partition? Can anyone comment as to why the avro schema is tied to a partition and not simply to the whole table? &lt;/P&gt;&lt;PRE&gt;alter table rm partition (year=2016) set serdeproperties ('avro.schema.url' = 'hdfs://mpws:8020/jup1_stats/avro/rm_1.avsc');&lt;/PRE&gt;&lt;P&gt;Since all my data is in partitions under the "year=2016" partition, I was able to just specify that one partition and it applied the change to all partitions under that.&lt;/P&gt;</description>
      <pubDate>Sat, 21 May 2016 21:32:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-change-location-of-avro-schema-url-location-in-Hive/m-p/160577#M29179</guid>
      <dc:creator>Mark_Petronic</dc:creator>
      <dc:date>2016-05-21T21:32:07Z</dc:date>
    </item>
  </channel>
</rss>

