About Daming Xue

Daming Xue · ‎04-09-2021

Hello Hive timestamp does support up to 9 digits decimal places (nano seconds) For your case, maybe you can check whether for those timestamp with none-zero nano seconds, e.g. 1750-01-01 00:00:00.123456789, whether such data can be exported correctly And for your example, 00:00:00.0 equals to 00:00:00, you didn't lose any precision, as it is zero nanosecond

Daming Xue · ‎04-09-2021

Hello According to the documentation related to the state management, it will only pull the new files compared to last run https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.5.0/org.apache.nifi.processors.azure.storage.ListAzureBlobStorage/ State management: Scope Description CLUSTER After performing a listing of blobs, the timestamp of the newest blob is stored. This allows the Processor to list only blobs that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.

Daming Xue · ‎04-09-2021

Hello This HUE distcp editor is designed to replicate data within the cluster and/or with the object store You can click on the "..." button next to the input box to see what are the directories you account has access to, but within the current cluster scope For data replication between two clusters, use Cloudera Manager/Replication Manager https://docs.cloudera.com/cdp/latest/data-migration/topics/rm-dc-data-replication.html

Daming Xue · ‎04-09-2021

Hello Have you tried to tune the performance? https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/tune_llap.html

Daming Xue · ‎04-09-2021

Hello According to NiFi documentation, everytime the processor performing a listing of blobs, it auto picks up the new data since last run https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.5.0/org.apache.nifi.processors.azure.storage.ListAzureBlobStorage/index.html State management: Scope Description CLUSTER After performing a listing of blobs, the timestamp of the newest blob is stored. This allows the Processor to list only blobs that have been added or modified after this date the next time that the Processor is run. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data.

Daming Xue · ‎04-09-2021

Hello Do you have an active subscription with Cloudera? All platform binaries are now behind paywall and only available for Cloudera customer to download details: https://www.cloudera.com/downloads/paywall-expansion.html

Daming Xue · ‎03-30-2021

Hello On the NiFi Controller Service Details page, under tab properties, set the value for the "Database User"

Daming Xue · ‎03-30-2021

Hello Try to set the "Database User", e.g. hive If you refer to the Hive2 JDBC documentation, the user ID is required https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-UsingJDBC The default <port> is 10000. In non-secure configurations, specify a <user> for the query to run as. The <password> field value is ignored in non-secure mode. Connection cnct = DriverManager.getConnection("jdbc:hive2://<host>:<port>", "<user>", "");

Daming Xue · ‎03-30-2021

Hello This is related to the Cache Management of HDFS As described in the documentation: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html In this architecture, the NameNode is responsible for coordinating all the DataNode off-heap caches in the cluster. The NameNode periodically receives a cache report from each DataNode which describes all the blocks cached on a given DN. The NameNode manages DataNode caches by piggybacking cache and uncache commands on the DataNode heartbeat. If the metric is going up, one possibility could be your namenode is too busy to handle the request

Daming Xue · ‎03-30-2021

Hello You can follow below guide to install CDP Private Cloud Base Trial https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-trial-download-information.html

Online	Offline
Last Visited	‎02-27-2023 01:53 AM

Member Since	‎08-16-2015 06:33 AM
Last Visited	‎02-27-2023 01:53 AM
Posts	97
Kudos received	16

Cloudera Community

Re: scientific data in hadoop

Re: Increase max row size in HIVE

Re: Why do I have "incompatible Parquet schema" er...

Re: Flume re-engineering in CDP 7.x

Re: Service monitor keeps crashing

Re: Hive output write on HDFS csv file Milisecond ...

Re: Apache Nifi: ListAzureBlobStorage doesn't ta...

Re: How to use distcp editor of HUE?

Re: Why does hive LLAP seem slower than hiveserver...

Re: Apache Nifi - Trigger/notification for new dat...

Re: Ambari setup

Re: Get DATA from hive using apache nifi

Re: Get DATA from hive using apache nifi

Re: what is meaning of datanode numBlocksFailedToU...

Re: How do we install Cloudera Community Edition