Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1091 | 11-29-2023 01:16 PM | |
1173 | 10-27-2023 04:29 PM | |
1156 | 07-07-2023 10:20 AM | |
2518 | 03-21-2023 08:35 AM | |
922 | 01-25-2023 08:50 PM |
06-08-2021
10:00 AM
1 Kudo
Hello @subrkann , As of CDP Private Cloud Base release 7.1.6, IBM PPC hardware is supported with RHEL 7.9. However, Impala is not supported on IBM Power System at this time (see here for that exception, under 7.1.4 table). In the future CDP Private Cloud Base will support Impala as well, but I can't provide a timeline for that. Hope this clarifies things. Regards, Alex
... View more
06-07-2021
07:00 AM
Hello Ahmed, Please see this post that gives an example of Python 3 script use with Oozie. Note that you specify the executable for python3 as the first line in your .py script. https://community.cloudera.com/t5/Community-Articles/Oozie-Python-workflow-example-walkthrough/ta-p/245240 Let me know if that helps, Alex
... View more
06-03-2021
06:49 AM
Ok, after you execute COMPACT on your table. Can you also run: SHOW COMPACTIONS; To see what state the compaction operation ends up in. Another place to look is in the HMS logs. Search for your table name and see what compaction events have and have not occurred for your table. Please provide pertinent log lines here. Also check to see if you have this parameter in hive-site.xml: hive.metastore.housekeeping.threads.on==true This is responsible for timing out stale transactions on the table. If it's not on, stale transactions are never cleaned up and, as a consequence, Hive does not remove the old delta files. If you make a change to this value, you'll need to restart the stale service.
... View more
06-02-2021
07:45 AM
Hello again, Base on the output, looks like the table itself is using the default compaction settings. That's good. Next place to check is the global Hive parameters and their values. Note that some of these settings are turned off by default (e.g. hive.compactor.initiator.on = false). See documentation below: https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/managing-hive/topics/hive-compact-properties.html Hope this helps, Alex
... View more
06-01-2021
12:56 PM
If your files contain one record each, then yes, these NiFi processors will be equivalent.
... View more
06-01-2021
12:31 PM
PublishKafka sends the entire flow file to Kafka as a single message. PublishKafkaRecord splits each "record" (e.g. row) in the flow file and sends each record as a message. References: 1. PublishKafka: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-0-9-nar/1.6.0/org.apache.nifi.processors.kafka.pubsub.PublishKafka/ 2. PublishKafkaRecord: https://nifi.apache.org/docs/nifi-docs/components/nifi-docs/components/org.apache.nifi/nifi-kafka-2-0-nar/1.9.0/org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_2_0/index.html
... View more
06-01-2021
10:45 AM
Hello, Could you please provide the output of SHOW TBLPROPERTIES compaction_check.check_file; Also, which version of CDH/HDP or CDP are you using? Thank you, Alex
... View more
05-27-2021
11:59 AM
Hi Chandu, I think you need to define what you mean by "long running jobs" and also look at some ways to kill jobs outside of Zeppelin (e.g. this thread). Keep in mind that it's one thing to close a Zeppelin session, but it's another to stop, say, Spark streaming application that was launched from Zeppelin and is running on YARN cluster indefinitely. If you are running a local job in Zeppelin, then using the 2 parameters listed should be able to do the trick. Regards, Alex
... View more
05-25-2021
12:47 PM
Hi Chandu, The 2nd and 3rd parameters you listed have to do with a newer feature of Zeppelin called Interpreter Lifecycle Management, new in 0.8.0 release. The lifecycle manager is responsible to periodically (at checkinterval) check if the session is idle, and once a certain time passes (threshold), the manager will terminate the interpreter session. Default value for the threshold is set at 1 hour. The other parameter, zeppelin.interpreter.connect.timeout, is responsible for truncating output for a given cell. If output is being continuously produced by Zeppelin interpreter and it doesn't stop after the default value of 30000 milliseconds (or 30 seconds), then Zeppelin will truncate the output right there. At least that's my understanding. Hope it helps, Alex
... View more
05-25-2021
10:43 AM
Hello, Hive compaction runs first, then a cleaner thread waits for all readers to finish reading the old base/delta files. When it determines that nobody is reading the old files anymore, only then does the deletion of the old files occur. Please run SHOW COMPACTIONS and look at the state of the compaction in question. Also would help to know which version are you on for CDH/HDP/CDP? If CDP, public or private cloud? Hope this helps, Alex
... View more