About RAGHUY

MattWho · ‎05-20-2024

@galt @RAGHUY Let me add some correction/clarity to the accepted solution. Export and Modify Flow Configuration: Export the NiFi flow configuration, typically in XML format. This can be done via the NiFi UI or by utilizing NiFi's REST API. Then, manually adjust the XML to change the ID of the connection to the desired value. It is not clear here what is being done. The only way to export a flow configuration from NiFi in XML format is via generating a NiFi template (deprecated and removed in Apache NIFi 2.x versions). Even if you were to generate a template and export is via NiFi UI or NiFi's rest-api, modifying it will not change what is on the canvas. If you were to modify the connection component UUID in all places in the template. Upon upload of that template back in to NiFI, you would need to drop the template on the the canvas which would result in every component in that template getting a new UUID. So this does not work. In newer version of NiFi 1.18+ NiFi supports newer flow definitions which are in json format. but same issue persists here when using flow definitions in this manor. In a scenario like the one described in this post where user removed a connection by mistake and then re-created it, the best option is to restore/revert the previous flow. Whenever a change is made to the canvas, NIFi auto archives the current flow.xml.gz (legacy) and flow.json.gz (current) file in to an archive sub-directory and generates a new flow.xml.gz/flow.json.gz file. Best and safest approach approach is to shutdown all nodes in your NiFi cluster. Navigate to the NiFi conf directory and swap current flow.xml.gz/flow.json.gz files with the archived flow.xml.gz/flow.json.gz files still containing the connection with original needed ID. When above is not possible (maybe change went unnoticed for to long and all archive version have new connection UUID), you need to manually modify the flow.xml.gz/flow.json.gz files. Shutdown all your NiFi nodes to avoid any changes being made on canvas while performing following steps. Option 1: Make backup of current flow.xml.gz and flow.json.gz Search each file for original UUID to make sure it does not exist. On one node manually modify the flow.xml.gz and flow.json.gz files by locating the current bad UUID and replacing it with the original needed UUID. Copy the modified flow.xml.gz and flow.json.gz files to all nodes in the cluster replacing original files. this is possible since all nodes run same version of flow. Option 2: same as option 1 same as option 1 same as option 1 Start NiFi only on the node where you modified the flow.xml.gz and flow.json.gz files. On all other nodes still stopped, remove or rename the flow.xml.gz and flow.json.gz files. Start all the remaining nodes. since they do not have a flow.xml.gz or flow.json.gs to load, they will inherit the flow from the cluster as they join the cluster. NOTE: The flow.xml.gz was replaced by the newer flow.json.gz format starting with Apache NiFi 1.16. When NiFi is 1.16 or newer is started with and only has a flow.xml.gz file, it will load from flow.xml.gz and then generate the new flow.json.gz format. Apache NiFi 1.16+ will load only from the flow.json.gz on startup when that file exists, but will still write out both the flow.xml.gz and flow.json.gz formats anytime a change is made to the canvas. With Apache NiFi 2.x+ version the flow.xml.gz format will go away. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-20-2024

@SAMSAL This is not a new problem, but rather something that has existed with NiFi on Windows fro a very long time. You'll need to avoid using space in directory names or warp that directory name in quotes to avoid the issue. NIFI-200 - Bootstrap loader doesn't handle directories with spaces in it on Windows Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

RAGHUY · ‎05-19-2024

@ChineduLB Apache Impala does not enable multi-statement transactions, so you cannot perform an atomic transaction that spans many INSERT statements directly. You can achieve a similar effect by combining the INSERT INTO commands into a single INSERT INTO... SELECT statement that includes a UNION ALL. This method assures that all partitions are loaded within the same query run. you can consolidate your insert statements into one query INSERT INTO client_view_tbl PARTITION (cobdate, region) SELECT col, col2, col3, '20240915' AS cobdate, 'region1' AS region FROM region1_table WHERE cobdate = '20240915' UNION ALL SELECT col, col2, col3, '20240915' AS cobdate, 'region2' AS region FROM region2_table WHERE cobdate = '20240915' UNION ALL SELECT col, col2, col3, '20240915' AS cobdate, 'region3' AS region FROM region3_table WHERE cobdate = '20240915'; Single Query Execution: This approach consolidates multiple INSERT statements into one, which can improve performance and ensure consistency within the query execution context. Simplified Management: Managing a single query is easier than handling multiple INSERT statements. Ensure that your source tables (region1_table, region2_table, region3_table) and the client_view_tbl table have compatible schemas, especially regarding the columns being selected and inserted. Be mindful of the performance implications when dealing with large datasets. Test the combined query to ensure it performs well under your data volume. By using this combined INSERT INTO ... SELECT ... UNION ALL approach, you can effectively populate multiple partitions of the client_view_tbl table in one query. "please accept it as a solution if it it helps"

snm1523 · ‎05-13-2024

Thank you for the response @RAGHUY. Would these be the only steps we need to perform or based on your experience, you have identified anything additional which is not mentioned in the documentation you shared. Thanks snm1523

RAGHUY · ‎05-12-2024

@ChineduLB WITH data_counts AS ( SELECT COUNT(*) AS count_table1, COUNT(*) AS count_table2, COUNT(*) AS count_table3, COUNT(*) AS count_table4, COUNT(*) AS count_table5, COUNT(*) AS count_table6 FROM table1 WHERE date_partition = 'your_date' -- Replace 'your_date' with the specific date you're interested in UNION ALL SELECT COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*) FROM table2 WHERE date_partition = 'your_date' UNION ALL SELECT COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*) FROM table3 WHERE date_partition = 'your_date' UNION ALL SELECT COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*) FROM table4 WHERE date_partition = 'your_date' UNION ALL SELECT COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*) FROM table5 WHERE date_partition = 'your_date' UNION ALL SELECT COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*), COUNT(*) FROM table6 WHERE date_partition = 'your_date' ) SELECT CASE WHEN SUM(count_table1) > 0 AND SUM(count_table2) > 0 AND SUM(count_table3) > 0 AND SUM(count_table4) > 0 AND SUM(count_table5) > 0 AND SUM(count_table6) > 0 THEN (SELECT * FROM table1 WHERE date_partition = 'your_date') ELSE NULL -- or whatever you want to return if data doesn't exist in all tables END AS result FROM data_counts;

RAGHUY · ‎05-12-2024

@ChineduLB Impala doesn't directly support nested select statements within the WHEN clause of a CASE statement. However, you can achieve similar logic Subqueries for conditions: You can use subqueries within the WHEN clause to evaluate conditions based on data retrieved from other tables. SELECT case when (select count(*) from table1) > 0 then (select * from table1) when (select count(*) from table2) > 0 and (select count(*) from table3) > 0 then (select * from table3) else null end as result_table; This query checks if table1 has any rows. If yes, it selects all columns from table1. Otherwise, it checks if both table2 and table3 have rows. If both have data, it selects all columns from table3. If none of the conditions are met, it returns null.

RAGHUY · ‎05-12-2024

@Marks_08 1. Verify if any firewalls are blocking incoming connections on ports 10000 (HiveServer2) and 10002 (Thrift server). You can use tools like netstat -atup or lsof -i :10000 to check if any processes are listening on these ports. If a firewall is restricting access, configure it to allow connections on these ports from the machine where you're running Beeline. 2. Double-check the HiveServer2 configuration files (hive-site.xml and hive-env.sh) in Cloudera Manager. Ensure that the hive.server2.thrift.port property is set to 10000 in hive-site.xml. Verify that the HIVESERVER2_THRIFT_BIND_HOST environment variable (if set) in hive-env.sh allows connections from your Beeline machine. Make sure the HiveServer2 service has the necessary permissions to bind to these ports. 3. beeline -u jdbc:hive2://<HOST>:10000/;principal=hive/USR@PWD (specifies principal) 4. Try restarting the Hive and HiveServer2 services in Cloudera Manager. This can sometimes resolve conflicts or configuration issues. 5. Check the HiveServer2 log files (usually in /var/log/hive-server2/hive-server2.log) for any error messages that might indicate why it's not listening on the expected ports.

RAGHUY · ‎05-01-2024

@VenkataAvinash The error you're encountering (java.lang.RuntimeException: org.apache.storm.thrift.TApplicationException: Internal error processing submitTopologyWithOpts) indicates that there's an issue with submitting the Storm topology, but it doesn't directly point to the specific cause. However, based on your configuration and the error message, it seems like there might be an issue with the Kerberos authentication setup or configuration for the Storm Nimbus service. =>Review Kerberos Configuration: Double-check the Kerberos configuration for Storm Nimbus and ensure that it matches the settings in your storm.yaml file. Verify that the Kerberos principal (hdfs/hari-cluster-test1-master0.avinash.ceje-5ray.a5.cloudera.site@AVINASH.CEJE-5RAY.A5.CLOUDERA.SITE) and keytab file (/root/hdfs.keytab) are correctly specified. =>Check Keytab Permissions: Ensure that the keytab file /root/hdfs.keytab has the correct permissions set and is accessible by the Storm Nimbus service. =>Verify Service Principals: Confirm that the Kerberos principal (hdfs/hari-cluster-test1-master0.avinash.ceje-5ray.a5.cloudera.site@AVINASH.CEJE-5RAY.A5.CLOUDERA.SITE) is correctly configured for the Storm Nimbus service and that it has the necessary permissions to access HDFS. =>Check Nimbus Logs: Check the Nimbus logs (nimbus.log) for any additional error messages or stack traces that might provide more insight into the issue. =>Classpath Issues:Confirm that the versions of Storm, HDFS, and Kerberos libraries on your cluster are compatible with each other. Refer to the documentation for each component for known compatibility issues. =>Try submitting a simpler topology without the HDFS bolt initially to see if the basic Kerberos configuration works. This can help isolate the issue further. =>Consider using a tool like klist to verify if your user has successfully obtained a Kerberos ticket before submitting the topology.

RAGHUY · ‎05-01-2024

@wallacei Error: sqlline-thin.py is configured to use Protobuf serialization for communication with PQS. Protobuf relies on pre-defined class names to parse responses from the server. The error message suggests that sqlline-thin.py is unable to find the class name for a specific response message from PQS. =================== Check PQS Configuration: Ensure PQS is configured to use Protobuf serialization as well. This might involve checking configuration files or options during PQS startup. Verify Library Versions: Make sure the versions of sqlline-thin.py and the Phoenix libraries (including PQS) are compatible. Inconsistent versions might lead to class name mismatch issues. You can check the documentation for sqlline-thin.py for specific version compatibility information. Consider sqlline.py (Regular JDBC): As your sqlline.py script works with regular JDBC, it suggests the basic Phoenix connection is functional. You might consider using sqlline.py for now while troubleshooting the Protobuf issue with sqlline-thin.py. Alternative Tools: If sqlline-thin.py continues to cause problems, explore alternative tools for connecting to Phoenix like the Phoenix JDBC thin client or a GUI client like Squirrel SQL. Double-check the connection URL in sqlline-thin.py. Ensure it points to the correct PQS endpoint (http://localhost:8765 by default).

RAGHUY · ‎05-01-2024

@VTHive Assuming you have a table named your_table with a column named condition, you can extract the variable names using SQL: SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(condition, '=', 1), ' ', -1) AS variable_name FROM your_table UNION SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(condition, ' in ', 1), ' ', -1) AS variable_name FROM your_table WHERE condition LIKE '% in %' UNION SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(condition, '=', 1), ' ', -1)) AS variable_name FROM your_table WHERE condition LIKE '% ne %'; The query will extract the variable names from the conditions in the condition column of your table. It handles conditions with =, in, and ne operators. Adjust the table and column names accordingly to fit your actual schema

Online	Offline
Last Visited	‎07-25-2026 05:27 AM

Member Since	‎10-11-2022 11:06 PM
Last Visited	‎07-25-2026 05:27 AM
Posts	137
Kudos received	22

Cloudera Community

Re: How to import/export templates(if removed then...

Re: Unable to initialize compute cluster CDP Publi...

Re: Unable to initialize compute cluster CDP Publi...

Re: Can I still use HDFS like normal when HDFS Bal...

Re: Does the service stop if a disk io error occur...

Re: Manually change connection id apache NiFi

Re: java.lang.ClassNotFoundException: org.apache.n...

Re: Insert Into Multiple Partitions with one Query

Re: Finalise Kafka upgrade - CDP Private Cloud

Re: Query To Return Result Only If Data Exists in ...

Re: Select Statement Inside Case Statement In Impa...

Re: Hive and Hive Server 2 are in green but couldn...

Re: Storm Topology not submitting with Secure HDFS...

Re: Cannot connect to Phoenix using Phoenix Query ...

Re: how do I get all the variable names in a strin...