Member since
11-24-2015
223
Posts
10
Kudos Received
0
Solutions
04-15-2021
05:56 PM
Hi, we are trying to backup a kudu table as below : spark2-submit --principal <user> --keytab <keytab> --master yarn --deploy-mode cluster --queue <queue> --executor-memory 12G --executor-cores 4 --driver-memory 4G --driver-cores 1 --class org.apache.kudu.backup.KuduBackup kudu-backup2_2.11-1.13.0.7.1.5.0-257.jar --kuduMasterAddresses $KUDU_MASTERS --rootPath hdfs:///backups --forceFull true impala::<table> And it is super slow. Any suggestions on how to make it run faster? Appreciate the feedback.
... View more
Labels:
- Labels:
-
Apache Kudu
04-06-2021
10:49 AM
BTW files with ".tmp" extension could be under any subdirectory under "/backups". Thanks.
... View more
04-06-2021
10:45 AM
Hi, In BDR HDFS replication I want to exclude all files which end with ".tmp" in the directory "/backups/"
Appreciate it if somebody could give the expression for this to add in BDR "Add Exclusion".
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS
04-23-2020
06:53 AM
funnily nothing turned up in the log. But we got this alert : Content: The health test result for HTTPFS_SCM_HEALTH has become bad: This role's process exited. This role is supposed to be started. We are using cloudera 5.15.1 on RHEL.
... View more
04-22-2020
06:53 AM
We are experiencing out of memory errors with httpfs. This is happening when users use Hue to access a particular larger folder in hdfs. We upped "java heap size of httpfs" to 1 gb - but still facing issue. There is also a "java client heap size" parameter - will upping that help in our case? Appreciate the insights.
... View more
Labels:
- Labels:
-
HDFS
04-16-2020
05:50 AM
We are trying to implement alerting in our cluster and alerting is setup in Cloudera Manager. So when I stop a service in Cloudera Manager, an alert is sent to my email. Because for some reason I hear that if you stop the service from CM, it is not the same as it crashing on its own. Especially with regards Canary Alerts, which we will not get if we stop a service through Cloudera Manager. So will I not get Canary Alerts for a service if the service is stopped through Cloudera Manager? Also I would like to know how to stop a service manually through Cloudera API. I would appreciate it if some forum member could give the command to stop - say Oozie or HBase - through Cloudera Manager API. Appreciate the help.
... View more
Labels:
- Labels:
-
Cloudera Manager
03-11-2019
02:02 PM
@Kuldeep Kulkarni, there are many lines with "input data" in the page you referred - not sure which ones to ignore. Should I ignore the sections for datasets/input events/output events - that will leave only the workflow section. Is that right? Can't I use the coordinator from your shell action example? But in that I don't see : "<app-path>${workflowAppUri}</app-path>" Appreciate the clarification.
... View more
03-08-2019
07:07 PM
@Kuldeep Kulkarni , does your example : https://community.hortonworks.com/articles/27497/oozie-coordinator-and-based-on-input-data-events.html set the job to run once a day? If not, can you please let me know how to do that? I want to run a job once daily. Thanks.
... View more
03-08-2019
06:58 PM
Hi Kuldeep, thanks so much for the clarification. I will try to do as per your instructions and let you know how it went. Thanks again.
... View more
03-08-2019
02:34 PM
@Kuldeep Kulkarni, I created python action based on https://community.hortonworks.com/content/supportkb/151119/how-to-run-a-python-script-using-oozie-shell-actio.html But how do I integrate coordinator.xml with that? I tried creating the file but it is not executing as per that. Is there somewhere in job.properties or workflow.xml that you mention coordinator.xml? Appreciate the feedback.
... View more
03-06-2019
07:21 PM
yes, I checked the nodes and found the output in one of them. I reran to make sure. So the script is not really needed on the Linux box? all of them - job.properties, the shellscript, workflow.xml, coordinator.xml? They need to be only in hdfs? Also next, how to execute a python code from oozie? Also I want it to run daily. Appreciate the insights.
... View more
03-06-2019
07:10 PM
>this file should be created locally on the node manager where your shell script was run I have the file locally on a node. It is on this node that I execute the oozie command. Are you saying that the shell script should be in hdfs? And that yarn will execute this shell script on the node manager on which this job was run? Rather than on the host from which I run the oozie command? Appreciate the feedback.
... View more
03-05-2019
06:18 PM
In the yarn log for the job I can see : ================================================================= >>> Invoking Shell command line now >> Exit code of the Shell command 0 <<< Invocation of Shell command completed <<< <<< Invocation of Main class completed <<< Oozie Launcher, capturing output data: =======================
... View more
03-05-2019
02:54 PM
I created a similar job as in : https://community.hortonworks.com/content/supportkb/48985/how-to-setup-oozie-shell-action.html On execution the yarn log shows as success. But I can't see the /tmp/output file being created anywhere. I checked on the local linux host as well as hdfs. One question I have is in job.properties - oozie.coord.application.path=${nameNode}/user/root/apps/shell Should the configuration files (job.properties, coordinator.xml etc) reside in the above directory? I have them there. Not sure what is happening. Appreciate the insights.
... View more
- Tags:
- Oozie
Labels:
- Labels:
-
Apache Oozie
12-28-2018
11:22 AM
So we can use regex for flume/hbase only with delimited or fixed length data? We cannot use regex with data without delimitation or variable lengths? Appreciate the clarification.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache HBase
12-20-2018
07:24 PM
this is the flume configuration hbase sink part : r_hbase.sinks.sink1.channel = channel1 r_hbase.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink r_hbase.sinks.sink1.table = mev:rtable r_hbase.sinks.sink1.columnFamily = me_data r_hbase.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
r_hbase.sinks.sink1.serializer.schemaURL = hdfs://host.com:8020/tmp/avroschemas-new/rtable.json
r_hbase.sinks.sink1.serializer.columns = col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
... View more
12-20-2018
07:15 PM
We have data coming into hdfs through flume. This data is serially encoded. We use the avro schema to decode it. Now we want to put this data from flume into a hbase table instead of hdfs. We use below parameter to do serial decoding based on avro schema : ragent.sinks.sink1.serializer.schemaURL = hdfs://host.com:8020/tmp/avroschemas-new/rtable.json Will this same argument work when the target is hbase instead of hdfs? along with hbase table, hbase table column family etc? Actually for some reason the data is not going in properly into hbase, and I am not sure if this avro schema is the issue. Appreciate the insights.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache HBase
12-19-2018
06:28 PM
@Naveen my hbase table is created only with a single column family. create 'mbev:hb_test' , 'me_data' Then data is fed into the hbase table through flume. This data originally was sent to a hive table with 10 columns - but for small files issue, i am redirecting the same data to the above hbase table. my flume config has the following lines : r_hbase.sinks.sink1.table = mbev:hb_test r_hbase.sinks.sink1.columnFamily = me_data r_hbase.sinks.sink1.serializer.columns =col1,col2,col3,col4,col5,col6,col7,col8,col9,col10 So I can see data coming into the hbase table. Data is showing like : hbase(main):008:0> scan "mbev:hb_test" ROW COLUMN+CELL default0998c6b9-2fa9 column=me_data:pCol, timestamp=1545242255268, value=.z3knt
-122e-1536-0o42ef7fb dErc90GqYg5a3n-zTQ\a04NQA32018-12-19-12.57.30.000123\x09FY
a3e YY\q1V10006002079317\x01M\x00\x09KEF\x00\x00 So how do I create the hive table to see the data in the hbase table. I tried as we did above initially with only "":key,me_data:id", but don't see the data in the hive table. I also tried : create external table tmp_test4 (col1 string,col2 string,col3 string,col4 string,col5 string,col6 string,col7 string,col8 string,col9 string,col10 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,me_data:col1,me_data:col2,me_data:col3,me_data:col4,me_data:col5,me_data:col6,me_data:col7,me_data:col8,me_data:col9,me_data:col10")
TBLPROPERTIES("hbase.table.name" = "mbev:hb_test"); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 10 elements while hbase.columns.
mapping has 11 elements (counting the key if implicit)) Appreciate your help.
... View more
12-18-2018
08:49 PM
btw : the create table shows two columns : create external table hv_test (id string, idate string) so how can this statement show only 1 column : WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,me_data:id") don't we need to likewise map idate?
... View more
12-18-2018
04:51 PM
In my hbase table actually data for 10 columns lands from flume. In the flume configuration I have defined : host1.sinks.sink1.serializer.columns = col1, col2 ... In HBASE the table is only defined as above : create 'mbev:hb_test' , 'me_data' So how do I define the hive table for the respective data for the 10 columns?
... View more
12-17-2018
06:07 PM
I created a table in hbase. Created a table in hive mapping it to the table in hbase. Inserted a row in the hbase table. But hive table doesn't show any rows. Any idea what I am doing wrong? Appreciate the feedback. hbase(main):019:0> create 'mbev:hb_test' , 'me_data' 0 row(s) in 1.2670 seconds hbase(main):006:0> put "mbev:hb_test",'1',"me_data:id",'1' 0 row(s) in 0.1700 seconds hbase(main):007:0> scan "mbev:hb_test" ROW COLUMN+CELL
1 column=me_data:id, timestamp=1545064141017, value=1
1 row(s) in 0.0590 seconds hive>create external table hv_test (id string, idate string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,me_data:idate") TBLPROPERTIES("hbase.table.name" = "mbev:hb_test"); hive> select * from hv_test ; OK
Time taken: 0.23 seconds
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
11-30-2018
04:28 PM
I tried the below but it didn't work : hive> create view testview as select * from test1 where id = "{$hiveconf:id}";
OK
Time taken: 0.13 seconds hive> set id=1; hive> select * from testview; Above query did not return any rows.
... View more
11-30-2018
04:22 PM
I have two tables. test1 and test2 hive> select * from test1;
OK id year 1 2017 2 2017 hive> select * from test2;
OK no year 2 2017 3 2017 query : select id, year from test1 where id > 1 union all select no, year from test2 where no > 1 question 1: if i put the above query in a view and can I pass a parameter to it to use in the where clause (for id and no) ? question 2 : can i frame the above query without the union all appreciate the feedback.
... View more
- Tags:
- Data Processing
- Hive
Labels:
- Labels:
-
Apache Hive
11-16-2018
06:18 PM
ok, this can be done simply as : partitioned by (yr string, mth string). tks.
... View more
11-16-2018
06:11 PM
I tried this but wouldn't work : create table test_part_bkt_tbl (id string, cd string, dttm string) partitioned by (yr string) clustered by (month(dttm)) into 12 buckets;
... View more
11-16-2018
05:54 PM
if i partition a table by year - can i further bucket it by month? so the idea is the year will be the top level and months will be at a level beneath it. so the directory structure would be : 2018 -> 1, 2, 3 ... 12 2019 -> 1, 2, 3 ... 12 Is this what bucketing is about? Or should i be doing this someway with partitions itself? Appreciate the insights.
... View more