About John Russell

lmartin · ‎08-01-2017

Documentation is not clear about this. Defect IMPALA-5740 will address this question. https://issues.apache.org/jira/browse/IMPALA-5740 Thanks, Luis Martinez.

noorbl · ‎09-29-2016

hi https://issues.cloudera.org/browse/IMPALA-2023 is fixed in Impala Shell v2.2 i am using Impala Shell v2.3.0 any one know how to declare and set value to variable in script. Thanks in advance

Jais · ‎12-01-2015

Hi We are also facing the same issue of invalid file footer, the table is created as follows : 2 tables created CREATE EXTERNAL TABLE ABC_TEXT ( NAME STRING, ID INT, PHONE INT) PARTITION BY (Customer_id INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' STORED AS TEXTFILE LOCATION '/USER/ABC_TEXT ; CREATE EXTERNAL TABLE ABC_PARQUET ( NAME STRING, ID INT, PHONE INT ) PARTITION BY (Customer_id INT) STORED AS PARQUET LOCATION '/USER/ABC_PARQUET' ; Then run the insert script, which inserts data perfectly but when queried on parquet table getting following error Error: Caused by: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500312) Error in fetching data rows: Invalid file footer Please let me know what I am doing wrong.

John Russell · ‎04-23-2015

Here are 2 ways to constrain the output to only be a single file. You can set the query option NUM_NODES=1, and all work is done on the coordinator node. You can put a large LIMIT on the query, bigger than the number of rows you are actually inserting, and all the intermediate results are combined on the coordinator node. (I have not looked into the mechanics enough to say which way is more efficient.) Here's an example where, by default, a CREATE TABLE AS SELECT operation would produce 4 output files, because I'm on a 4-node cluster. (The source table BILLION_NUMBERS has 113 files and 2.79 GB, enough data so that it won't go into a single output file by accident.) Setting NUM_NODES=1 produces a single output file. Setting NUM_NODES back to 0 and then doing CTAS+LIMIT produces a single output file. [localhost:21000] > show table stats billion_numbers; +-------+--------+--------+--------------+-------------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+--------+--------+--------------+-------------------+--------+-------------------+ | -1 | 113 | 2.79GB | NOT CACHED | NOT CACHED | TEXT | false | +-------+--------+--------+--------------+-------------------+--------+-------------------+ [localhost:21000] > set; Query options (defaults shown in []): ... NUM_NODES: [0] ... [localhost:21000] > create table num_nodes_0 as select * from billion_numbers; +----------------------------+ | summary | +----------------------------+ | Inserted 1000000000 row(s) | +----------------------------+ [localhost:21000] > show table stats num_nodes_0; +-------+--------+--------+--------------+-------------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+--------+--------+--------------+-------------------+--------+-------------------+ | -1 | 4 | 2.79GB | NOT CACHED | NOT CACHED | TEXT | false | +-------+--------+--------+--------------+-------------------+--------+-------------------+ [localhost:21000] > set num_nodes=1; [localhost:21000] > create table num_nodes_1 as select * from oreilly.billion_numbers; +----------------------------+ | summary | +----------------------------+ | Inserted 1000000000 row(s) | +----------------------------+ [localhost:21000] > show table stats num_nodes_1; +-------+--------+--------+--------------+-------------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+--------+--------+--------------+-------------------+--------+-------------------+ | -1 | 1 | 2.79GB | NOT CACHED | NOT CACHED | TEXT | false | +-------+--------+--------+--------------+-------------------+--------+-------------------+ [localhost:21000] > set num_nodes=0; [localhost:21000] > create table ctas_with_limit as select * from billion_numbers limit 100000000000000; +----------------------------+ | summary | +----------------------------+ | Inserted 1000000000 row(s) | +----------------------------+ [localhost:21000] > show table stats ctas_with_limit; +-------+--------+--------+--------------+-------------------+--------+-------------------+ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | +-------+--------+--------+--------------+-------------------+--------+-------------------+ | -1 | 1 | 2.79GB | NOT CACHED | NOT CACHED | TEXT | false | +-------+--------+--------+--------------+-------------------+--------+-------------------+

John Russell · ‎12-02-2014

From the latest 2.x documentation: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_udf.html#udfs_hive_unique_2 "Hive UDAFs and UDTFs are not supported." John

sergey.sheypak566881637 · ‎10-16-2013

We do it using oozie coordinator which runc workflow: parse data, add partition, invalidate metadata. And we met another problem:) http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/INVALIDATE-METADATA-suddenly-caused-javax-jdo/m-p/2353#M66 Right now it's not a tragedy, we are waiting for bug fixes.

John Russell · ‎09-27-2013

This certainly seems like it's worth filing a bug. Could you submit a bug with a test case at issues.cloudera.org?

Online	Offline
Last Visited	‎08-22-2017 05:27 PM

Member Since	‎09-11-2013 08:37 AM
Last Visited	‎08-22-2017 05:27 PM
Posts	20
Kudos received	1

Cloudera Community

Re: Writing contents of table to file

Re: Hive UDAF in Impala

Re: Impala: Parquet error "Invalid file footer" on...

Re: Is the Impala String column/datatype REALLY li...

Re: How to set variables in Impala scripts

Re: Impala: Parquet error "Invalid file footer" on...

Re: Writing contents of table to file

Re: Hive UDAF in Impala

Re: I do suffer a lot because Impalad doesn't see ...

Re: Anyone successfully run an impala script?