Member since
06-24-2016
10
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1713 | 04-26-2018 10:22 AM | |
9362 | 01-05-2017 03:07 PM |
04-26-2018
10:22 AM
Hi Team I was able to resolve the issue. We had a flatten operation in pig module which resulted in disambiguate Operator(::) among the schema defnition; removed the disambiguate operator by providing the schema while flattening. Thanks Aparna
... View more
04-24-2018
01:38 PM
Spark Version - 2.1 Pig Version - 0.16
... View more
04-24-2018
01:33 PM
Hi We are trying to process orc output (generated from pig module) using Spark. Seems like the tuple schema defined in pig module have been creating issue in the spark The exception is as follows val df = sqlContext.read.format("orc").load("<hdfs orc path>")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input ':' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'FIRST', 'LAST', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 'GLOBAL', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 17)
== SQL ==
struct<val_tuple::id:string,val_tuple::recid:string,val_tuple::entry_time:string>
-----------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
The tuple schema that have been defined in pig module have been creating issue while reading the out orc file using spark.
Have any one faced similar issues, Any help is highly appreciated. Thanks Aparna
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache Spark
01-05-2017
03:07 PM
Hi I was able to resolve the issue,the disk utilization in local directory (where logs and out files are created) in one of the node was more than the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting.
I freed up some space and also set the max-disk-utilization-percentage to much higher value. Thanks Aparna
... View more
11-23-2016
06:14 AM
Hi Please see my inline comments. Have you tried this in Spark? or NiFi? No How much is configured in YARN for your job resources? Memory allocated for yarn containers in each node - 200GB Can you post additional logs? code? submit details? I did not get any extra info other than FSError Why is the key an avro record and not the value? I am using AvroKeyInputFormat You should make sure you have enough space in HDFS and also in the regular file system as some of the reduce stage will get mapped to regular disk. I have enough space left in HDFS more precisely HDFS -only 3% is being used and Local FS -only 15% is being used Ulimit core file
size (blocks, -c) 0 data seg
size (kbytes, -d)
unlimited scheduling
priority
(-e) 0 file
size
(blocks, -f) unlimited pending
signals
(-i) 1032250 max locked memory (kbytes,
-l) 64 max memory
size (kbytes, -m) unlimited open
files
(-n) 1024 pipe
size (512
bytes, -p) 8 POSIX message queues (bytes, -q)
819200 real-time
priority
(-r) 0 stack
size
(kbytes, -s) 10240 cpu
time
(seconds, -t) unlimited max user
processes
(-u) 1024 virtual
memory (kbytes, -v)
unlimited file
locks
(-x) unlimited Thanks
... View more
11-16-2016
07:11 AM
1 Kudo
Hi I am trying to process avro record using mapreduce where the key of the map is an avro record public void map(AvroKey<GenericData.Record> key, NullWritable value, Context context) The job fails if the number of columns to be processed in each record goes beyond a particular value.Say for example if the number of fields in each row is more than 100, my job fails.I tried to increase the map memory and java heap space in the cluster, but it didn't help. Thanks in advance Aparna
... View more
Labels:
- Labels:
-
Apache Hadoop