Member since
01-07-2020
64
Posts
0
Kudos Received
0
Solutions
11-29-2022
02:01 AM
I want to download specific wfs from oozie in hue 4 but I can not find such an option. Can you please help?
... View more
Labels:
- Labels:
-
Apache Oozie
11-23-2022
07:38 AM
Hi @Shahrukh_shaikh. I do not have them now. What do you mean data issue? When I run theough terminal everything runs smoothly
... View more
11-23-2022
06:29 AM
I have a job which runs a hive query inside. When it comes the time for the query Oozie throws this error:
Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running, vertexName=Map 1, vertexId=vertex_1668428709182_0049_1_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1668428709182_0049_1_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1668428709182_0049_1_00Vertex failed, vertexName=Map 1, vertexId=vertex_1668428709182_0049_1_00, diagnostics=[Vertex vertex_1668428709182_0049_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE, Vertex vertex_1668428709182_0049_1_00 [Map 1] failed as task task_1668428709182_0049_1_00_000000 failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
I can not understand a lot of this error but when I run the job through terminal it ends successfully.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Oozie
11-22-2022
10:54 PM
APAI am about to upgrade from cdh to cdp and I have some questions regarding new version of Hive. Until now I used to have hive as etl service because it is more stable but slower than impala. My tables that bi users see are in impala. My questions are: 1) Is hive 3 fast enough to compete impala ?
2) In case of bi use is it more appropriate to point hive or impala(I read that hive 3 uses cache and makes bi repeated requests faster)?
3) In case of kafka flow, is it appropriate to create an acid table in hive 3 and store the fetched data live ?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
11-18-2022
01:34 AM
I am trying to run an impala shell and I receive the below error: Error connecting: TypeError, __init__() got an unexpected keyword argument 'ssl_version' This happens after the upgrade to 3.4 impala version.
... View more
Labels:
- Labels:
-
Apache Impala
11-16-2022
05:00 AM
I want to upgrade from cdh to cdp but I have many wfs in oozie in hue. Is there a risk of losing them after the upgrade? How can I save these wfs in order to upload them again after the update? Thanks in advance
... View more
- Tags:
- Oozie
Labels:
- Labels:
-
Apache Oozie
07-08-2022
06:48 AM
I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done:
sqoop import to a staging table all the ids from the impala table
select id from sqoop_table where id not in(select id impala_table)
save the result to a .txt
create a var and store the seded .txt in order to make the results from vertical to horizontal.
From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long.
The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k
Is there any other way to compare the remote table with my impala table and fetch only the missing records?
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Sqoop
06-20-2022
11:18 PM
Hi, I have some schedulers in OOZIE through hue and some of them some times fail. However when I run them manually after, they end successfully. Is there any way to put retry policy in my WFs? Here is the error that I am taking: Exit code of the Shell command 1 <<< Invocation of Shell command completed <<< java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:410) at org.apache.oozie.action.hadoop.LauncherAM.access$300(LauncherAM.java:55) at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:223) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217) at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141) Caused by: org.apache.oozie.action.hadoop.LauncherMainException at org.apache.oozie.action.hadoop.ShellMain.run(ShellMain.java:76) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:104) at org.apache.oozie.action.hadoop.ShellMain.main(ShellMain.java:63) ... 16 more Failing Oozie Launcher, Main Class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] Oozie Launcher, uploading action data to HDFS sequence file: Stopping AM
... View more
Labels:
- Labels:
-
Apache Oozie
06-20-2022
08:04 AM
Hi, I run a sqoop import in order to fetch data from a table in sql server. Inside the sqoop I have a query which fetches every 6 mins data from 2 hours before until now. The weird thing is that sqoop doesnt fetch all the data. It is somehow random how many data it fetches. My sqoop command is the below. For example in this 2 hours scanning I fetched a record 28 times. 19 times all the rows were align with the sql server but 9 they were half sqoop import --connect 'jdbc:sqlserver\ --username \ --password-alias \ --num-mappers 10 \ --split-by an_int \ --fields-terminated-by '|' \ --query "select * from table where timestamp > '${offset}' and \$CONDITIONS" \ --delete-target-dir \ --target-dir The amount of the data for 2 hours is ~800k
... View more
- Tags:
- Sqoop
Labels:
- Labels:
-
Apache Sqoop
04-01-2022
08:14 AM
I am trying to run an insert statement in hive but it returns me this error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask The same insert runs perfect through impala. What exactly is this error ?
... View more
Labels:
- Labels:
-
Apache Hive
03-10-2022
04:15 AM
Hi,
Does anyone know when CDP Certified Data Developer will be released?
Thanks in advance.
... View more
- Tags:
- CDE
- CDP
- certification
Labels:
02-21-2022
12:22 AM
I am trying to run a script in oozie and every time I receive the below error regarding impala.dbapi. The module is inserted correctly in the script. Stdoutput Traceback (most recent call last): Stdoutput File "/tmp/sorting_table.py", line 8, in <module> Stdoutput from impala.dbapi import connect Stdoutput ImportError: No module named impala.dbapi Exit code of the Shell command 1 <<< Invocation of Shell command completed <<< java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:410) at org.apache.oozie.action.hadoop.LauncherAM.access$300(LauncherAM.java:55) at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:223) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217) at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141) Caused by: org.apache.oozie.action.hadoop.LauncherMainException at org.apache.oozie.action.hadoop.ShellMain.run(ShellMain.java:76) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:104) at org.apache.oozie.action.hadoop.ShellMain.main(ShellMain.java:63) Script import libraries: from pyspark import SparkContext from pyspark.sql import SparkSession from datetime import datetime,timedelta import ssl from impala.dbapi import connect import thrift_sasl import os
... View more
Labels:
- Labels:
-
Apache Impala
02-15-2022
12:26 AM
Hi, After some actions in a cluster, oozie in hue hasn't got shell action in the ACTIONS tab. It has only HIVE SCRIPT ,HIVESERVER2 SCRIPT, SUB WORKFLOW, SSH, FS, EMAIL, STREAMING, GENERIC, KILL . Is there any way to run my shell script now? My oozie version is 5.1.0-cdh6.3.4
... View more
Labels:
- Labels:
-
Apache Oozie
01-20-2022
06:14 AM
Hi, I want to ask is there any way to make oozie run a WF only if the previous execution has finished?
... View more
Labels:
- Labels:
-
Apache Oozie
12-13-2021
11:21 PM
I have an ETL flow which transfers data from a hive table to another through pyspark. The tables are partitioned. Although I see that in the partition's path in HDFS there are small parquet files. I want to ask: 1)How can I merge these files? 2)Is there any max size or recommended size for hive partitions?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
12-10-2021
02:55 AM
Hi, I want to create a hive table which will store data with orc format and snappy compression. Will power bi be able to read from that table? Also do you suggest any other format/compression for my table?
... View more
Labels:
- Labels:
-
Apache Hive
12-02-2021
01:21 AM
Hi, I have a sqoop jopb in order to transafer data from MySQL to hive with incremental import. Until now it is working fine but I want to ask how can I tell the job to rerun if it will fail? Is that possible? I saw this article: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#validation but i am not sure. Can someon give an example?
... View more
Labels:
- Labels:
-
Apache Sqoop
11-11-2021
02:58 AM
Hi, After restarting impala I ran some queries and two of them threw an error for to_timestamp function that I have in my view. If I rerun these queries everything runs smoothly. Why is this happening? It is like impala can not read the functions in the first-run of the queries. Thanks in advance
... View more
Labels:
- Labels:
-
Apache Impala
11-10-2021
04:26 AM
Hi @ChethanYM, 1)Unfortunately I haven't keep the full log from the query. 2)Exactly, this is my issue. 3)From impala shell. 3)If QUERY_TIMEOUT_S it is then it has the default value. Regards , Teo
... View more
11-09-2021
03:23 AM
Hi, I am running a query and despite the fact that it reached the limit in (8 mins) and it seems FINISHED inside the query details in CM, it stops at (19 mins). Whys it this happening? I saw the below information in query details: First row fetched: 7.9m (75ms) Last row fetched: 8.0m (7.21s) Released admission control resources: 19.5m (11.6m) Unregister query: 19.5m (103ms)
... View more
Labels:
- Labels:
-
Apache Impala
11-08-2021
12:49 AM
Hi, What is the difference in impala executors and impala coordinators ? Which one shall I increase in order to run my query faster ? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Impala
11-03-2021
03:43 AM
Hi @balajip I know how to create a UDF. My problem is that every time I restart impala the udf is gone. Is there any way to keep UDF after the restart or I have to create it every time ?
... View more
11-03-2021
12:56 AM
Hi, I have create a UDF in impala but every time I am restarting it the UDF is gone. Is there anyway to fix this issue ? c reate function my_udf(String,String,String) returns boolean location 'udfs-1.0-SNAPSHOT-jar-with-dependencies.jar' SYMBOL='MyUDF ;
... View more
- Tags:
- impala
Labels:
- Labels:
-
Apache Impala
10-19-2021
06:14 AM
Hi, I am new in Kafka and I have some questions. I want to create a pipeline which extract data from Mysql to kafka. Although after a lot of search I found something about CDH which is when something in the source table changes to send the updated record to Kafka. I do not want such a thing. I only want to extract data according to primary key. Is there any course to do in order to learn or can anybody provide any tips? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Kafka
09-30-2021
12:18 AM
I have some tables in hive and I want to find the size of each table through the metastore (MySQL). I am trying the below but it returns paratition_params, notification_log, sds etc. My tables are stored in TABLES.TBLS but when I am running the below query with from information_schema.TABLES.TBLS it returns this: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '.TBLS
ORDER BY
(DATA_LENGTH + INDEX_LENGTH)
DESC
LIMIT 0, 200' at line 5
SELECT
TABLE_NAME AS ` Table `,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 ) AS `Size (MB)`
FROM
information_schema.TABLES
ORDER BY
(DATA_LENGTH + INDEX_LENGTH)
DESC ;
... View more
Labels:
- Labels:
-
Apache Hive
09-29-2021
01:01 AM
Hi, I am trying to create a dashboard in CM with impala charts in order to see the maximum memory of impala and the one that my queries are using. I running tsquery for total_impala_admission_controller_local_backend_mem_reserved_across_impala_daemon_pools and total_impala_admission_controller_local_backend_mem_usage_across_impala_daemon_pools But it seems that the charts are representing exactly the same thing. Why is this happens?
... View more
Labels:
- Labels:
-
Apache Impala
09-28-2021
06:12 AM
But it is not sorted 2019-09-18 08:44:10 2020-08-05 13:15:48 2020-08-05 13:24:00 2020-10-15 18:29:34 2020-09-09 09:35:04 Supposed to be asc but its not
... View more
09-27-2021
11:20 PM
The table properties include sort by in a timestamp column. So I am running after select * from db.table1 and it returns me the data like above.
... View more
09-27-2021
07:16 AM
I have some tables in impala and in the create tables statement they have sort by( timestamp column) but when I query them they are returning the data not sorted. Why is this happening? For example 2019-09-18 08:44:10 2020-08-05 13:15:48 2020-08-05 13:24:00 2020-10-15 18:29:34 2020-09-09 09:35:04
... View more
- Tags:
- impala
Labels:
- Labels:
-
Apache Impala