About vanhalen

vanhalen · ‎08-30-2017

hard to tell based on the information you provided but see if you can increase Pentaho's memory settings (edit spoon.bat). If that doesn't work, check Impala's catalog'd memory setting. Hope this helps.

vanhalen · ‎08-30-2017

there is a uuid function in impala that you can use to generate surrogate keys for kudu. or you can write an impala udf to generate unique bigints.

vanhalen · ‎06-19-2017

The best way to deal with small files is to not have to deal with them at all. You might want to explore using Kudu or HBase as your storage engine instead of HDFS (Parquet).

vanhalen · ‎06-16-2017

if writing to parquet you just have to do something like: df.write.mode("append").parquet("/user/hive/warehouse/Mytable") and if you want to prevent the "small file" problem: df.coalesce(1).write.mode("append").parquet("/user/hive/warehouse/Mytable")

vanhalen · ‎06-15-2017

You need to configure NTP correctly. "Four NTP servers is the recommended minimum. Four servers protects against one incorrect timesource, or "falseticker". " for tips in configuring NTP. https://access.redhat.com/solutions/58025

vanhalen · ‎06-14-2017

you can use DECODE or CASE.

vanhalen · ‎06-14-2017

does it have to be a sequence? or would a unique value be sufficient? If that's the case Impala's got a uuid() function that you can use. Or if a BIGINT is required you can hash the uuid() to get a BIGINT value.

vanhalen · ‎06-14-2017

You might have to include your GPFS libraries to your SPARK_CLASSPATH and LD_LIBRARY_PATH

vanhalen · ‎06-13-2017

One way is to use selectExpr and use cast. val ConvertedDF = joined.selectExpr("id","cast(mydoublecol as double) mydoublecol");

Online	Offline
Last Visited	‎12-10-2017 05:50 PM

Member Since	‎06-13-2017 11:44 PM
Last Visited	‎12-10-2017 05:50 PM
Posts	25
Kudos received	3

Cloudera Community

Re: [Impala] - GC overhead limit exceeded error in...

Re: IMPALA: Adding PRIMARY KEY while doing CREATE ...

Re: Any good methods for compacting small files in...

Re: SPARK Dataframe and IMPALA CREATE TABLE issue

Re: kudu service are getting down frequently

Re: Transpose columns to rows

Re: Sequence number generation in impala

Re: Writing from Spark to a shared file system

Re: SPARK Dataframe and IMPALA CREATE TABLE issue