<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig ORC output Schema inference by Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225954#M77623</link>
    <description>&lt;P&gt;Hi Team&lt;/P&gt;&lt;P&gt; I was able to resolve the issue. We had a flatten operation in pig module which resulted in disambiguate Operator(::) among the schema defnition; removed the disambiguate operator by providing the schema while flattening.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aparna&lt;/P&gt;</description>
    <pubDate>Thu, 26 Apr 2018 17:22:10 GMT</pubDate>
    <dc:creator>aparna24aravind</dc:creator>
    <dc:date>2018-04-26T17:22:10Z</dc:date>
    <item>
      <title>Pig ORC output Schema inference by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225952#M77621</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;We are trying to process orc output (generated from pig module) using Spark.&lt;/P&gt;&lt;P&gt;Seems like the tuple schema  defined in pig module have been creating issue in the spark&lt;/P&gt;&lt;P&gt;The exception is as follows&lt;/P&gt;&lt;PRE&gt;val df = sqlContext.read.format("orc").load("&amp;lt;hdfs orc path&amp;gt;")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input ':' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 17)


== SQL ==
struct&amp;lt;val_tuple::id:string,val_tuple::recid:string,val_tuple::entry_time:string&amp;gt;
-----------------^^^
  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)

The tuple schema that have been defined in pig module have been creating issue while reading the out orc file using spark.

&lt;/PRE&gt;&lt;P&gt;Have any one faced similar issues, Any help is highly appreciated.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aparna&lt;/P&gt;</description>
      <pubDate>Tue, 24 Apr 2018 20:33:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225952#M77621</guid>
      <dc:creator>aparna24aravind</dc:creator>
      <dc:date>2018-04-24T20:33:24Z</dc:date>
    </item>
    <item>
      <title>Re: Pig ORC output Schema inference by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225953#M77622</link>
      <description>&lt;P&gt;Spark Version - 2.1&lt;/P&gt;&lt;P&gt;Pig Version - 0.16&lt;/P&gt;</description>
      <pubDate>Tue, 24 Apr 2018 20:38:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225953#M77622</guid>
      <dc:creator>aparna24aravind</dc:creator>
      <dc:date>2018-04-24T20:38:54Z</dc:date>
    </item>
    <item>
      <title>Re: Pig ORC output Schema inference by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225954#M77623</link>
      <description>&lt;P&gt;Hi Team&lt;/P&gt;&lt;P&gt; I was able to resolve the issue. We had a flatten operation in pig module which resulted in disambiguate Operator(::) among the schema defnition; removed the disambiguate operator by providing the schema while flattening.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aparna&lt;/P&gt;</description>
      <pubDate>Thu, 26 Apr 2018 17:22:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-ORC-output-Schema-inference-by-Spark/m-p/225954#M77623</guid>
      <dc:creator>aparna24aravind</dc:creator>
      <dc:date>2018-04-26T17:22:10Z</dc:date>
    </item>
  </channel>
</rss>

