About mszurap

mszurap · ‎08-22-2022

Hi @Shaswat , Without reviewing completely what (else) may be the problem, the "port=21000" is definitely not correct. Impala has two "frontend" ports to which the clients can connect: - Port 21000 is used only for "impala-shell" - Port 21050 is used for all the other client applications using JDBC, ODBC, Hue or other Python based applications using Impyla - which is also used in the above example. Please see Impyla docs for more. Best regards Miklos

mszurap · ‎08-02-2022

Hi @Neel_Sharma , The message suggests that the query tried to read the table's datafiles as if the table was parquet file based. (and parquet might be the default table format in Hive in CDP - of course only if the table format is not specified during creation) However the table creation script you've shared suggests the table should be text (CSV) based. Can you please verify it with checking what is the table format, with: DESCRIBE FORMATTED GeoIP2_ISP_Blocks_IPv4; Are you in the right database? For the second issue - how do you create the external tables from tab delimited files? How are the files uploaded to hdfs? Thanks Miklos

mszurap · ‎06-24-2022

Hi, The "Requested array size exceeds VM limit" means that your code tries to instantiate an array which has more than 2^31-1 elements (~2 billion) which is the max size of an array in Java. You cannot solve this with adding more memory. You need to split the work between executors and not process data on a single JVM (Driver side).

mszurap · ‎06-23-2022

wholeTextFiles is also not a scalable solution. https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.SparkContext.wholeTextFiles.html "Small files are preferred, as each file will be loaded fully in memory."

mszurap · ‎06-22-2022

Hi @Yosieam , Using "collect" method is not recommended as it needs to collect the data to the Spark driver side and as such it needs to fit the whole dataset into the Driver's memory. Please rewrite your code to avoid the "collect" method.

mszurap · ‎06-22-2022

HI @shivam0408 , Can you clarify what CDH/HDP/CDP version are you using and what is the datatype of the "DATETIME" column? What is the desired end result of this command? To drop all the partitions?

mszurap · ‎06-17-2022

Can you review the whole logfile? The above NPE may be just a side effect of another failure before.

mszurap · ‎06-17-2022

Hi @Uday_Singh2022 , yes, Flume is not a supported component in CDP. You can find documentations on Flume on it's official website: https://flume.apache.org/ Have you considered to use CDF / Nifi for this usecase? https://docs.cloudera.com/cdf-datahub/latest/nifi-hbase-ingest/topics/cdf-datahub-nifi-hbase-ingest.html Thanks, Miklos

mszurap · ‎06-17-2022

Hi @PCP2 , can you clarify which HDP/CDH/CDP version are you using? Is this a one-off or an intermittent issue or does it always happen? Is this affecting only a single job? What kind of an action is Oozie trying to launch? Thanks, Miklos

mszurap · ‎06-10-2022

Hi @luckes , Please check if your source code file (test.java) has UTF-8 encoding and how are you compiling the class (for example when using Maven you might need to specify to use utf-8 encoding while compiling the classes. These special characters can be easily lost if somewhere the encoding is not set properly. Alternatively you can use the unicode notation \uXXXX to make sure the character is properly understood by java. For example 张 is: https://www.compart.com/en/unicode/U+5F20 so in source code it looks like statement.setString(2, "\u5f20\u4e09"); Of course it is rare that one needs to hardcode special characters in the source code, usually it is read from a datafile - where you can specify what encoding to use during reading.

Online	Offline
Last Visited	‎12-10-2024 10:10 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎12-10-2024 10:10 AM
Posts	260
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Not able to connect to IMPALA/ HIVE table

Re: Error on Creating External Table from CSV in I...

Re: Error spark job input is too large to fit in a...

Re: Error spark job input is too large to fit in a...

Re: Error spark job input is too large to fit in a...

Re: error in dropping partition in hive

Re: Oozie job failed due to below error.

Re: Apache Flume required to be run in CDP environ...

Re: Oozie job failed due to below error.

Re: impala jdbc doesn't work for preparestatement ...