About drgenious

jAnshula · ‎04-03-2024

Hi @drgenious First, please test your script outside of Oozie. If it is working outside of Oozie, then it should work from Oozie as well. As for error "No module named impala.dbapi" it could be that there is some version dependency issue with impyla and its related libraries refer ---> https://github.com/cloudera/impyla/issues/227

Rami_Sunkara · ‎02-21-2024

Got to know from the dev team that they have modified the column definition. We ran the MSCK repair table and we are able to run the select distinct query. Vertex errors may not relate to memory issues. Hope this helps the community.

jAnshula · ‎02-15-2024

Hi @drgenious Please share us the Sqoop console logs with --verbose logs and the yarn application logs for review.

jAnshula · ‎02-15-2024

Hi @drgenious Login to HUE Web UI Go to Documents Page Right Click the Document associated with your Workflow, Use Download option to export the workflow

ggangadharan · ‎11-21-2023

The error message indicates that there is an inconsistency between the expected schema for the column 'db.table.parameter_11' and the actual schema found in the Parquet file 'hdfs:/path/table/1_data.0.parq'. The column type is expected to be a STRING, but the Parquet schema suggests that it is an optional int64 (integer) column. To resolve this issue, you'll need to investigate and potentially correct the schema mismatch. Here are some steps you can take: Verify the Expected Schema: Check the definition of the 'db.table.parameter_11' column in the Impala metadata or Hive metastore. Ensure that it is defined as a STRING type. Inspect the Parquet File Schema: You can use tools like parquet-tools to inspect the schema of the Parquet file directly. Run the following command in the terminal: bash parquet-tools schema 1_data.0.parq Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. Compare Expected vs. Actual Schema: Compare the expected schema for 'db.table.parameter_11' with the actual schema found in the Parquet file. Identify any differences in data types. Investigate Data Inconsistencies: If there are data inconsistencies, investigate how they might have occurred. It's possible that there was a schema evolution or a mismatch during the data writing process. Resolve Schema Mismatch: Depending on your findings, you may need to correct the schema mismatch. This could involve updating the metadata in Impala or Hive to match the actual schema or adjusting the Parquet file schema. Update Impala Statistics: After resolving the schema mismatch, it's a good practice to update Impala statistics for the affected table. This can be done using the COMPUTE STATS command in Impala: This step ensures that Impala has up-to-date statistics for query optimization. Here's a high-level example of what the Parquet schema inspection might look like: parquet-tools schema 1_data.0.parq Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. If the data type in the Parquet schema is incorrect, you may need to investigate how the data was written and whether there were any issues during that process. Correcting the schema mismatch and updating Impala statistics should help resolve the issue.

arunek95 · ‎12-22-2022

Hi @drgenious , We have this doc which explains the steps to migrate Oozie WF's from CDH to CDP , please have a look on it. https://docs.cloudera.com/cdp-one/saas/cdp-one-data-migration/topics/cdp-saas-oozie-migration-workflows-in-cdh.html If you still have more queries please reach out to our support through Cloudera portal.

Kartik_Agarwal · ‎12-18-2022

@drgenious This is an OS-level issue that will need to be addressed at the OS level by the system admin. The bottom line here is that thrift-0.9.2 needs to be uninstalled There are various things that could be happening: 1) Multiple python versions. 2) Multiple pip versions. 3) Broken installation. Solution: 1 - You can try to create the Python virtual environment to connect to impala-shell virtualenv venv -p python2 cd venv source bin/activate (venv) impala-shell Solution : 2 (i) Remove easy-install.pth files available in, /usr/lib/python2.6/site-packages/ /usr/lib64/python2.6/site-packages/ (ii) Try running impala-shell If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

asish · ‎11-23-2022

@drgenious 1. Impala is always faster. Impala does not use yarn. Impala stores catalog data locally which fetches information faster. Impala backend gthread is built on C++ which is very fast. 2. Impala is not fault tolerant , it is best suited for adhoc queries and ETL is best suited for Hive as Hive is fault tolerant. If the query fails due to network/disk failure,hive will retry but Impala would fail. 3. For stemaming/ingestion like Kafka flow you need to put it in EXTERNAL tables not in Managed(ACID) tables. Managed tabled can be used,if you want to perform alteration of the data like Update/Delete . Please let me know,if you have any queries. Please click "Accept As Solution" , if your query is answered.

drgenious · ‎07-08-2022

I have a table in impala and I want every day to check the source table with sqoop to see if there are any missing ids. For this purpose I have done: sqoop import to a staging table all the ids from the impala table select id from sqoop_table where id not in(select id impala_table) save the result to a .txt create a var and store the seded .txt in order to make the results from vertical to horizontal. From this step I have issues. When I try to parse this var in sqoop to fetch only the missing ids it throws me an error that argument is list too long. The thing is that I can not change the max capacity of vars. The average amount of ids for 2 days is 40k Is there any other way to compare the remote table with my impala table and fetch only the missing records?

VidyaSargur · ‎07-04-2022

@drgenious, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Online	Offline
Last Visited	‎06-26-2023 10:46 AM

Member Since	‎01-07-2020 06:44 AM
Last Visited	‎06-26-2023 10:46 AM
Posts	64
Kudos received	1

Cloudera Community

Re: Impala.dbapi error

Re: Execution Error, return code 2 from org.apache...

Re: Sqoop fetches less data

Re: Export specific wfs from oozie in HUE 4

Re: How to fix Parquet schema: optional int64 amou...

Re: Migrate oozie jobs from cdh to cdp

Re: unexpected keyword argument 'ssl_version'

Re: impala vs hive 3 clarifications

Fetch missing ids from impala with sqoop

Re: Retry in oozie through hue