Member since
04-26-2017
16
Posts
1
Kudos Received
0
Solutions
02-08-2019
01:29 PM
Tim, Thanks for the suggestion. I checked the startup logs for the impalad that is complaining and looks like there are some weird permission issues which we fixed and things are looking good. However there is one thing that is puzzling me. We have impala user allocated to impala_prod LDAP group, however, there is also a local user impala allocated to local group impala on the node running impalad. When we removed scratch dirs and fired up impalad, looks like scratch dirs are created with the user from local and group from LDAP. Can this possible? - Ravi
... View more
02-07-2019
09:52 PM
Hi All, When I try to run a huge query at Impala that does a lot of aggregation & sorts, it throws up with the following error. The error is vague and of not much help. Our cluster has 32 impala daemons and all the nodes have been configured with 8 disks of 4TB each. We defined scratch directories on all the disks/nodes in Cloudera Manager. I tried looking at the logs as indicated in the error message but I couldn't find anything useful. Could not create files in any configured scratch directories (--scratch_dirs=) on backend 'data405.prod.com:22000'. See logs for previous errors that may have prevented creating or writing scratch files. Version of impala: 2.12.0-cdh5.15.0 Any pointers would be greatly helpful. Thanks, Ravi
... View more
Labels:
- Labels:
-
Apache Impala
12-29-2017
03:13 PM
Hi, A column in my table has an unix time in milliseconds. When I am trying to use from_unixtime() it is returning me null. In the documentation its mentioned that from_unixtime() handles only unix time in seconds. Any specific problem handling milliseconds? select from_unixtime(1513895588243,"yyyy-MM-dd HH:mm:ss.SSSSSS"); Result> from_unixtime( 1513895588243 , 'yyyy-mm-dd hh:mm:ss.ssssss') NULL I am expecting ' 2017-12-21 22:33:08.243000000 ' I wrote below query which would give me the result as expected above, select 1513895588243, cast(concat(cast(from_unixtime(CAST(1513895588243/1000 as BIGINT), 'yyyy-MM-dd HH:mm:ss') as String),'.',substr(cast(1513895588243 as String), 11, 3)) as timestamp); Result: 1513895588243 2017-12-21 22:33:08.243000000 But, the above query is not an efficient way. Is there any efficient workaround for this?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
12-14-2017
12:28 PM
This is something I found on Stackoverflow.. https://stackoverflow.com/questions/36508553/how-to-specify-consumer-group-in-kafka-spark-streaming-using-direct-stream. Is there any other way of monitoring the backlog in this approach. I tried exploring Spark Metrics API. Even it is of no big use in this case.
... View more
12-14-2017
12:00 PM
Thanks for the response Zhang . But, I don't have a consumer in kafka as I am using direct stream approach(no receiver). I tried specifying kafka properties in my spark application like "group.id", "consumer.id" but remained with no luck. They didn't show up in my kafka consumer list. P.S: I am using old kafka conumer api.
... View more
12-13-2017
10:46 PM
I am trying to build my own tool(scripts) to monitor Kafka Backlog for spark streaming application with Kafka. I am using createDirectStream api of Spark, so I don't have any consumer created in Kafka. Because of this I am not able to monitor the backlog in kafka. Is there a way I can monitor kafka backlog in this approach?
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
09-07-2017
03:53 PM
Hi, I tried this in CDH 5.12 and it worked perfectly fine. I am not sure of the problem you are facing though. Thanks, Ravi
... View more
09-07-2017
03:39 PM
Hi Community, Spark DataFrames by default will use " null" for values that are unknown, missing or irrelevant. Considering this, when we define a schema for Dataframe (make a note that in general Dataframe will not have all the columns defined in the schema to be present in it) to upsert into Kudu, I observed wierd behaviour in Kudu table. I see that, updating a table in Kudu using Spark, replaces the columns that are not defined in the kudu upsert command(but present in the schema) with NULL. This is occuring because the Spark Dataframe is considering the missing values in the schema as NULL. Is this a bug? Or am I missing something here? Any inputs on working around with this? Thanks, Ravi
... View more
Labels:
- Labels:
-
Apache Kudu
-
Apache Spark
09-07-2017
03:16 PM
1 Kudo
Hi, You need to generate an RDD of structured data and write it to HDFS. Sample code in java is as follows, records.foreachRDD(new VoidFunction2<JavaRDD<String>, Time>() {
private static final long serialVersionUID = 1L;
@Override
public void call(JavaRDD<String> rdd, Time time) throws Exception {
if (rdd.count() > 0) {
rdd.saveAsTextFile(outputPath + "/" + time.milliseconds());
}
}
}); Hope this helps. Thanks, Ravi
... View more
09-07-2017
02:49 PM
Hi, Can you post the code on how you are trying to write to the output? Thanks, Ravi
... View more
06-20-2017
03:03 PM
Hi Tim, Can we have the flexibility of declaring the variable globally in UDF? Globally, I mean outside the function? And, the reason I am declaring a static variable is to restore the value of timestamp for every record so that I can perform a comparison of the timestamps. Is there an alternative approach for this? Thanks
... View more
06-20-2017
02:11 PM
Hi All, We are using Impala to do various processings in our systems. We have a requirement recently, wherein we have to handle the updates on the events i.e, we have an 'e_update' table which has the partial updates received for various events. The fields that are not updated are being stored as NULL values. Ex: ID (Int) Date_Time (timestamp) A (Int) B (String) C (String) 1 0 1 NULL NULL 1 1 2 Hi NULL 1 3 4 Hello Hi 1 2 5 NULL NULL 1 4 NULL NULL Zero P.S: Please consider Date_time as valid timestamp type values. For easy understanding, mentioned them as 0,1,2,3,4,5 As seen in the above table, the events have a unique id and as we get an update to a particular event, we are storing the date_time at which update has happened and also storing the partial updated values. Apart from the updated values, the rest are stored as NULL values. We are planning to mimic inplace updates on the table, so that it would retrieve the resulting table as follows using the query below: We don't delete the data. > SELECT id, current_val(A,date_time) as A, current_val(B,date_time) as B, current_val(C,date_time) as C from e_update GROUP BY ID; where, current_val is a custom impala UDA we are planning to implement. i.e. get latest non null value for the column. ID (Int) A (Int) B (String) C (String) 1 4 Hello Zero Implemented current_val UDA: The below code is only for int type inputs: uda-currentval.h //This is a sample for retrieving the current value of e_update table
//
void CurrentValueInit(FunctionContext* context, IntVal* val);
void CurrentValueUpdate(FunctionContext* context, const IntVal& input, const TimestampVal& ts, IntVal* val);
void CurrentValueMerge(FunctionContext* context, const IntVal& src, IntVal* dst);
IntVal CurrentValueFinalize(FunctionContext* context, const IntVal& val); uda-currentval.cc // -----------------------------------------------------------------------------------------------
// This is a sample for retrieving the current value of e_update table
//-----------------------------------------------------------------------------------------------
void CurrentValueInit(FunctionContext* context, IntVal* val) {
val->is_null = false;
val->val = 0;
}
void CurrentValueUpdate(FunctionContext* context, const IntVal& input, const TimestampVal& ts, IntVal* val) {
static TimestampVal* tsTemp;
tsTemp->date = 0;
tsTemp->time_of_day = 0;
if(tsTemp->date==0 && tsTemp->time_of_day==0){
tsTemp->date = ts.date;
tsTemp->time_of_day = ts.time_of_day;
val->val = input.val;
return;
}
if(ts.date > tsTemp->date && ts.time_of_day > tsTemp->time_of_day){
tsTemp->date = ts.date;
tsTemp->time_of_day = ts.time_of_day;
val->val = input.val;
return;
}
}
void CurrentValueMerge(FunctionContext* context, const IntVal& src, IntVal* dst) {
dst->val += src.val;
}
IntVal CurrentValueFinalize(FunctionContext* context, const IntVal& val) {
return val;
} We are able to build and create an aggregate function in impala, but when trying to run the select query similar to the one above, it is bringing down couple of impala deamons and throwing the error below and getting terminated. WARNINGS: Cancelled due to unreachable impalad(s): hadoop102.**.**.**.com:22000 We have impalad running on 14 instances. Can someone help resolve us this problem and a better way to achieve a solution for the scenario explained.
... View more
Labels:
- Labels:
-
Apache Impala
04-26-2017
08:17 PM
Hi, I am trying to install Arcadia Enterprise parcels in my cluster but facing the below error. Error for parcel ARCADIAENTERPRISE-3.3.0.0-1485371982.cdh5-el6.parcel : Hash file is not found. I am sure I have my parcels(.parcel, .parcel.sha) been placed in /opt/cloudera/parcel-repo but still facing this problem. Cluster - CDH 5.10.1
... View more
Labels:
- Labels:
-
Cloudera Manager