Member since
01-07-2020
64
Posts
1
Kudos Received
0
Solutions
04-03-2024
02:08 AM
Hi @drgenious First, please test your script outside of Oozie. If it is working outside of Oozie, then it should work from Oozie as well. As for error "No module named impala.dbapi" it could be that there is some version dependency issue with impyla and its related libraries refer ---> https://github.com/cloudera/impyla/issues/227
... View more
02-21-2024
06:43 AM
Got to know from the dev team that they have modified the column definition. We ran the MSCK repair table and we are able to run the select distinct query. Vertex errors may not relate to memory issues. Hope this helps the community.
... View more
02-15-2024
08:27 AM
1 Kudo
Hi @drgenious Please share us the Sqoop console logs with --verbose logs and the yarn application logs for review.
... View more
02-15-2024
07:54 AM
Hi @drgenious Login to HUE Web UI Go to Documents Page Right Click the Document associated with your Workflow, Use Download option to export the workflow
... View more
11-21-2023
05:43 AM
The error message indicates that there is an inconsistency between the expected schema for the column 'db.table.parameter_11' and the actual schema found in the Parquet file 'hdfs:/path/table/1_data.0.parq'. The column type is expected to be a STRING, but the Parquet schema suggests that it is an optional int64 (integer) column. To resolve this issue, you'll need to investigate and potentially correct the schema mismatch. Here are some steps you can take: Verify the Expected Schema: Check the definition of the 'db.table.parameter_11' column in the Impala metadata or Hive metastore. Ensure that it is defined as a STRING type. Inspect the Parquet File Schema: You can use tools like parquet-tools to inspect the schema of the Parquet file directly. Run the following command in the terminal: bash parquet-tools schema 1_data.0.parq Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. Compare Expected vs. Actual Schema: Compare the expected schema for 'db.table.parameter_11' with the actual schema found in the Parquet file. Identify any differences in data types. Investigate Data Inconsistencies: If there are data inconsistencies, investigate how they might have occurred. It's possible that there was a schema evolution or a mismatch during the data writing process. Resolve Schema Mismatch: Depending on your findings, you may need to correct the schema mismatch. This could involve updating the metadata in Impala or Hive to match the actual schema or adjusting the Parquet file schema. Update Impala Statistics: After resolving the schema mismatch, it's a good practice to update Impala statistics for the affected table. This can be done using the COMPUTE STATS command in Impala: This step ensures that Impala has up-to-date statistics for query optimization. Here's a high-level example of what the Parquet schema inspection might look like: parquet-tools schema 1_data.0.parq Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. If the data type in the Parquet schema is incorrect, you may need to investigate how the data was written and whether there were any issues during that process. Correcting the schema mismatch and updating Impala statistics should help resolve the issue.
... View more
12-22-2022
06:00 PM
Hi @drgenious , We have this doc which explains the steps to migrate Oozie WF's from CDH to CDP , please have a look on it. https://docs.cloudera.com/cdp-one/saas/cdp-one-data-migration/topics/cdp-saas-oozie-migration-workflows-in-cdh.html If you still have more queries please reach out to our support through Cloudera portal.
... View more
12-18-2022
05:16 AM
@drgenious This is an OS-level issue that will need to be addressed at the OS level by the system admin. The bottom line here is that thrift-0.9.2 needs to be uninstalled There are various things that could be happening:
1) Multiple python versions.
2) Multiple pip versions.
3) Broken installation. Solution: 1
- You can try to create the Python virtual environment to connect to impala-shell
virtualenv venv -p python2
cd venv
source bin/activate
(venv) impala-shell Solution : 2 (i) Remove easy-install.pth files available in,
/usr/lib/python2.6/site-packages/
/usr/lib64/python2.6/site-packages/
(ii) Try running impala-shell If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
... View more
11-23-2022
01:00 AM
@drgenious 1. Impala is always faster. Impala does not use yarn. Impala stores catalog data locally which fetches information faster. Impala backend gthread is built on C++ which is very fast. 2. Impala is not fault tolerant , it is best suited for adhoc queries and ETL is best suited for Hive as Hive is fault tolerant. If the query fails due to network/disk failure,hive will retry but Impala would fail. 3. For stemaming/ingestion like Kafka flow you need to put it in EXTERNAL tables not in Managed(ACID) tables. Managed tabled can be used,if you want to perform alteration of the data like Update/Delete . Please let me know,if you have any queries. Please click "Accept As Solution" , if your query is answered.
... View more
07-08-2022
06:48 AM
I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done:
sqoop import to a staging table all the ids from the impala table
select id from sqoop_table where id not in(select id impala_table)
save the result to a .txt
create a var and store the seded .txt in order to make the results from vertical to horizontal.
From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long.
The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k
Is there any other way to compare the remote table with my impala table and fetch only the missing records?
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Sqoop
07-04-2022
03:16 AM
@drgenious, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more