Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5798 | 12-18-2020 01:46 PM | |
3729 | 12-16-2020 12:11 PM | |
2645 | 12-07-2020 01:47 PM | |
1892 | 12-07-2020 09:21 AM | |
1224 | 10-14-2020 11:15 AM |
06-21-2024
02:41 AM
1 Kudo
@stigahuang wrote: Looks like there're no places to control the max JVM heap size of impalads in Cloudera Manager. (Only one for the catalogd) How can we set JAVA_TOOL_OPTIONS for impalads (coordinators)? In CM 7.6.7, there is a configuration option called "impalad_embedded_jvm_heapsize". Not sure when it was added. impalad_embedded_jvm_heapsize: Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.
... View more
03-05-2024
12:14 PM
@lv_antel Welcome to the Cloudera Community! As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
... View more
01-16-2024
01:49 PM
1 Kudo
There is a workaround to solve this. Is not a definitive solutions but it can help: The final result would be like this: Firstly, I created a table like your example (I used ";" as separator): insert overwrite t_1 select 'Asia' as cont,'Japan;China;Singapore;' Country_list union select 'Europe' as cont,'UK;Spain;Italy;German;Norway;' Country_list After, I created a external table. It must be stored as textfile: CREATE EXTERNAL TABLE IF NOT EXISTS t_transpose ( field_transpose string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ";" STORED AS TEXTFILE; Then insert on this table like this: insert overwrite t_transpose select REGEXP_REPLACE(Country_list, ';', concat("|", cont, '\n' ) ) as transpose from t_1; After you can select like in my example before: select split_part(field_transpose,"|",1), split_part(field_transpose,"|",2) from t_transpose; Ps: The final result could have some blank lines, just filter/ignore it. I also put one more ";" in the line comparing with the example informed.
... View more
11-20-2022
11:43 PM
can you fix it later.
... View more
04-30-2021
06:42 AM
@JasonBourne - if you have the same issue, here's a GitHub issue discussing it and linking to a pull request to fix it: https://github.com/cloudera/thrift_sasl/issues/28 You can see in the commits (here: https://github.com/cloudera/thrift_sasl/commits/master), they are testing a new release for a fix, but it looks like it's not quite done yet. Hopefully soon.
... View more
01-20-2021
09:38 AM
There's a 64kb limit on strings in Kudu but otherwise you can store any binary data in them. https://docs.cloudera.com/documentation/kudu/5-10-x/topics/kudu_known_issues.html#schema_design_limitations
... View more
01-19-2021
09:45 AM
Upgrading to a newer version of Impala will solve most scalability issues that you'd see on Impala 2.9, mostly because of https://blog.cloudera.com/scalability-improvement-of-apache-impala-2-12-0-in-cdh-5-15-0/.
... View more
12-22-2020
06:24 AM
@Tim Armstrong Thanks for helping out here. My apologies for mis-understanding w.r.t packing information.
... View more
12-21-2020
09:01 AM
We have some background on schema evolution in Parquet in the docs - https://docs.cloudera.com/runtime/7.2.2/impala-reference/topics/impala-parquet.html. See "Schema Evolution for Parquet Tables". Some of the details are specific to Impala but the concepts are the same across engines including Hive and Spark that use parquet tables. At a high level, you can think of the data files being immutable while the table schema evolves. If you add a new column at the end of the table, for example, that updates the table schema but leaves the parquet files unchanged. When the table is queried, the table schema and parquet file schema are reconciled and the new column's values will be all NULL. If you want to modify the existing rows and include new non-NULL values, that would require rewriting the data, e.g. with an INSERT OVERWRITE statement for a partition or a CREATE TABLE .. AS SELECT to create an entirely new table. Keep in mind that traditional Parquet tables are not optimized for workloads with updates - Apache Kudu in particular and also transactional tables in Hive3+ have support for row-level updates that is more convenient/efficient. We definitely don't require rewriting the whole table every time you want to add a column, that would be impractical for large tables!
... View more