About Khalef

Khalef · ‎06-28-2016

Thanks Lars.

Khalef · ‎06-24-2016

Hi Tim That is unfortunate. However, I am creating external tables with configured locations and that is great, because now the only thing I need to do to push new daily data is to just create the parquet file and alter the table with the new partition details. Now the challenge is the aggregated tables that are created as CREATE TABLE FOO AS SELECT..FROM ABC, XYZ,..WHERE...; These ones I have to recreate them at each time and there is no way to just alter them and make them aware of the new data. The queries are evaluated at creation time! Is there any other way to create aggregated tables that are updated automatically? Cheers Khalef.

Khalef · ‎06-24-2016

Hi Lars, Yes I have read I believe most of the articles and the doco writen on Kite SDK. However, my partition fields (year, month, day) are not part of my data files, and there is no date or timestamp field that tells me that this data is of today or a month ago. My partition config (if I can use one) would be: [{ "type" : "provided", "name" : "year" }, { "type" : "provided", "name" : "month" }, { "type" : "provided", "name" : "day" }] And when I want to csv-import or json-import my files I don't see how to tell kitesdk-dataset explicitly that I want to store the imported file in partition year=2016, month=05, day=30. Right now this is what I am doing: I create a dataset, create a partition directory and then copy the parquet file to it): > kite-dataset csv-schema ods_ml_au.Introducer_Group_30_05_2016.psv --class IntroducerGroup --delimiter '|' -o introducerGroup.avsc > hdfs dfs -put introducerGroup.avsc /user/caf/macleasing/format > kite-dataset create dataset:hdfs:/user/caf/macleasing/stage/ml/introducerGroups -s hdfs:/user/caf/macleasing/format/introducerGroup.avsc -f parquet > hdfs dfs -put ods_ml_au.Introducer_Group_30_05_2016.psv /user/caf/macleasing/source > kite-dataset csv-import hdfs:/user/caf/macleasing/source/ods_ml_au.Introducer_Group_30_05_2016.psv dataset:hdfs:/user/caf/macleasing/stage/ml/introducerGroups --delimiter '|' > hdfs dfs -mkdir -p /user/caf/macleasing/stage/ml/introducerGroups/year=2016/month=05/day=30 >hdfs dfs -mv /user/caf/macleasing/stage/ml/introducerGroups/*.parquet /user/caf/macleasing/stage/ml/introducerGroups/year=2016/month=05/day=30/ How can I avoid the explicit creation of directory and file movement?? I want to use my partition-config Cheers Khalef

Khalef · ‎06-23-2016

Hi, I am using kite sdk on quick start vm to do some datasets creation, but I can not see how to pass a provided partion value when I do csv-import or json-import. How can we achieve that? Thanks.

Khalef · ‎06-23-2016

Thank you Tim for the quick reply. When can we expect the feature to be available? this is my best selling point of Cloudera option for the team

Khalef · ‎06-23-2016

I read in a presentation that derived tables will be available in v > 2.4, I am using 2.5 with quick start but it seems that the CREATE DERIVED TABLE keyword is not supported???

Online	Offline
Last Visited	‎08-28-2016 08:03 PM

Member Since	‎06-23-2016 08:16 AM
Last Visited	‎08-28-2016 08:03 PM
Posts	9

Cloudera Community

Re: KITE SDK 'Provided partitioners do not referen...

Re: KITE SDK 'Provided partitioners do not referen...

Re: How to create impala derived tables

Re: KITE SDK 'Provided partitioners do not referen...

KITE SDK 'Provided partitioners do not reference a...

Re: How to create impala derived tables

How to create impala derived tables