About aparna24aravind

aparna24aravind · ‎04-26-2018

Hi Team I was able to resolve the issue. We had a flatten operation in pig module which resulted in disambiguate Operator(::) among the schema defnition; removed the disambiguate operator by providing the schema while flattening. Thanks Aparna

aparna24aravind · ‎04-24-2018

Spark Version - 2.1 Pig Version - 0.16

aparna24aravind · ‎04-24-2018

Hi We are trying to process orc output (generated from pig module) using Spark. Seems like the tuple schema defined in pig module have been creating issue in the spark The exception is as follows val df = sqlContext.read.format("orc").load("<hdfs orc path>") org.apache.spark.sql.catalyst.parser.ParseException: extraneous input ':' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 17) == SQL == struct<val_tuple::id:string,val_tuple::recid:string,val_tuple::entry_time:string> -----------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) The tuple schema that have been defined in pig module have been creating issue while reading the out orc file using spark. Have any one faced similar issues, Any help is highly appreciated. Thanks Aparna

aparna24aravind · ‎01-05-2017

Hi I was able to resolve the issue,the disk utilization in local directory (where logs and out files are created) in one of the node was more than the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting. I freed up some space and also set the max-disk-utilization-percentage to much higher value. Thanks Aparna

aparna24aravind · ‎11-23-2016

Hi Please see my inline comments. Have you tried this in Spark? or NiFi? No How much is configured in YARN for your job resources? Memory allocated for yarn containers in each node - 200GB Can you post additional logs? code? submit details? I did not get any extra info other than FSError Why is the key an avro record and not the value? I am using AvroKeyInputFormat You should make sure you have enough space in HDFS and also in the regular file system as some of the reduce stage will get mapped to regular disk. I have enough space left in HDFS more precisely HDFS -only 3% is being used and Local FS -only 15% is being used Ulimit core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1032250 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Thanks

aparna24aravind · ‎11-16-2016

Hi I am trying to process avro record using mapreduce where the key of the map is an avro record public void map(AvroKey<GenericData.Record> key, NullWritable value, Context context) The job fails if the number of columns to be processed in each record goes beyond a particular value.Say for example if the number of fields in each row is more than 100, my job fails.I tried to increase the map memory and java heap space in the cluster, but it didn't help. Thanks in advance Aparna

Online	Offline
Last Visited	‎10-18-2018 06:54 AM

Member Since	‎06-24-2016 10:36 AM
Last Visited	‎10-18-2018 06:54 AM
Posts	10
Kudos received	5

Cloudera Community

Re: Pig ORC output Schema inference by Spark

Re: File too large Exception

Re: Pig ORC output Schema inference by Spark

Re: Pig ORC output Schema inference by Spark

Pig ORC output Schema inference by Spark

Re: File too large Exception

Re: File too large Exception

File too large Exception