About Tim Armstrong

Donal_RC · ‎06-21-2024

@stigahuang wrote: Looks like there're no places to control the max JVM heap size of impalads in Cloudera Manager. (Only one for the catalogd) How can we set JAVA_TOOL_OPTIONS for impalads (coordinators)? In CM 7.6.7, there is a configuration option called "impalad_embedded_jvm_heapsize". Not sure when it was added. impalad_embedded_jvm_heapsize: Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.

DianaTorres · ‎03-05-2024

@lv_antel Welcome to the Cloudera Community! As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.

Honorio · ‎01-16-2024

There is a workaround to solve this. Is not a definitive solutions but it can help: The final result would be like this: Firstly, I created a table like your example (I used ";" as separator): insert overwrite t_1 select 'Asia' as cont,'Japan;China;Singapore;' Country_list union select 'Europe' as cont,'UK;Spain;Italy;German;Norway;' Country_list After, I created a external table. It must be stored as textfile: CREATE EXTERNAL TABLE IF NOT EXISTS t_transpose ( field_transpose string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ";" STORED AS TEXTFILE; Then insert on this table like this: insert overwrite t_transpose select REGEXP_REPLACE(Country_list, ';', concat("|", cont, '\n' ) ) as transpose from t_1; After you can select like in my example before: select split_part(field_transpose,"|",1), split_part(field_transpose,"|",2) from t_transpose; Ps: The final result could have some blank lines, just filter/ignore it. I also put one more ";" in the line comparing with the example informed.

netease · ‎11-20-2022

can you fix it later.

newbieone · ‎08-18-2022

Can we set 100g? Or maybe 1000g?

Maxsparrow · ‎04-30-2021

@JasonBourne - if you have the same issue, here's a GitHub issue discussing it and linking to a pull request to fix it: https://github.com/cloudera/thrift_sasl/issues/28 You can see in the commits (here: https://github.com/cloudera/thrift_sasl/commits/master), they are testing a new release for a fix, but it looks like it's not quite done yet. Hopefully soon.

Tim Armstrong · ‎01-20-2021

There's a 64kb limit on strings in Kudu but otherwise you can store any binary data in them. https://docs.cloudera.com/documentation/kudu/5-10-x/topics/kudu_known_issues.html#schema_design_limitations

Tim Armstrong · ‎01-19-2021

Upgrading to a newer version of Impala will solve most scalability issues that you'd see on Impala 2.9, mostly because of https://blog.cloudera.com/scalability-improvement-of-apache-impala-2-12-0-in-cdh-5-15-0/.

parthk · ‎12-22-2020

@Tim Armstrong Thanks for helping out here. My apologies for mis-understanding w.r.t packing information.

Tim Armstrong · ‎12-21-2020

We have some background on schema evolution in Parquet in the docs - https://docs.cloudera.com/runtime/7.2.2/impala-reference/topics/impala-parquet.html. See "Schema Evolution for Parquet Tables". Some of the details are specific to Impala but the concepts are the same across engines including Hive and Spark that use parquet tables. At a high level, you can think of the data files being immutable while the table schema evolves. If you add a new column at the end of the table, for example, that updates the table schema but leaves the parquet files unchanged. When the table is queried, the table schema and parquet file schema are reconciled and the new column's values will be all NULL. If you want to modify the existing rows and include new non-NULL values, that would require rewriting the data, e.g. with an INSERT OVERWRITE statement for a partition or a CREATE TABLE .. AS SELECT to create an entirely new table. Keep in mind that traditional Parquet tables are not optimized for workloads with updates - Apache Kudu in particular and also transactional tables in Hive3+ have support for row-level updates that is more convenient/efficient. We definitely don't require rewriting the whole table every time you want to add a column, that would be impractical for large tables!

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: impala java heap config

Re: Is CDP open source?

Re: lateral view explode in impala?

Re: IMPALAD_QUERY_MONITORING_STATUS has become bad

Re: Impala - Memory limit exceeded

Re: issue trying Impyla

Re: Does Impala support XMLTYPE and BLOB

Re: Impala ECONNRESET errors

Re: Impala Queries which were previously working a...

Re: How to add a new column to an existing parquet...