Member since
06-13-2017
25
Posts
3
Kudos Received
0
Solutions
08-30-2017
12:28 AM
1 Kudo
hard to tell based on the information you provided but see if you can increase Pentaho's memory settings (edit spoon.bat). If that doesn't work, check Impala's catalog'd memory setting. Hope this helps.
... View more
08-30-2017
12:22 AM
there is a uuid function in impala that you can use to generate surrogate keys for kudu. or you can write an impala udf to generate unique bigints.
... View more
06-19-2017
05:24 PM
The best way to deal with small files is to not have to deal with them at all. You might want to explore using Kudu or HBase as your storage engine instead of HDFS (Parquet).
... View more
06-16-2017
12:24 AM
1 Kudo
if writing to parquet you just have to do something like: df.write.mode("append").parquet("/user/hive/warehouse/Mytable") and if you want to prevent the "small file" problem: df.coalesce(1).write.mode("append").parquet("/user/hive/warehouse/Mytable")
... View more
06-15-2017
05:30 PM
You need to configure NTP correctly. "Four NTP servers is the recommended minimum. Four servers protects against one incorrect timesource, or "falseticker". " for tips in configuring NTP. https://access.redhat.com/solutions/58025
... View more
06-14-2017
06:32 PM
does it have to be a sequence? or would a unique value be sufficient? If that's the case Impala's got a uuid() function that you can use. Or if a BIGINT is required you can hash the uuid() to get a BIGINT value.
... View more
06-14-2017
12:04 AM
You might have to include your GPFS libraries to your SPARK_CLASSPATH and LD_LIBRARY_PATH
... View more
06-13-2017
11:44 PM
One way is to use selectExpr and use cast. val ConvertedDF = joined.selectExpr("id","cast(mydoublecol as double) mydoublecol");
... View more