About Shelton

Shelton · ‎07-21-2020

@Stephbat Please can you check those 2 values dfs.datanode.max.locked.memory and ulimit The dfs.datanode.max.locked.memory determines the maximum amount of memory a DataNode will use for caching. The "locked-in-memory size" corresponds to ulimit (ulimit -l) of the DataNode user that needs to be increased to match this parameter. The current dfs.datanode.max.locked.memory is 2 GB and while the RLIMIT_MEMLOCK is 16 MB If you get the error “Cannot start datanode because the configured max locked memory size… is more than the datanode’s available RLIMIT_MEMLOCK ulimit,” that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. Usually, this value is configured in /etc/security/limits.conf. However, it will vary depending on what operating system and distribution you are using please adjust the values accordingly remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache. Once adjust the datanode should start as a charm 🙂 Hope that helps

Shelton · ‎07-21-2020

@focal_fossa My guess is you are running out of memory. I would like to know how much memory you have? Copying local files to HDFS is done using the mapreduce job when we use put or copyFromLocal commands it is actually using Streaming by the hadoop client binary client libraries and queues. So my guess is that the Ambari views copy might also be using MR behind the scenes. Another alternative is to use DistCp [distributed copy] a tool used for large inter/intra-cluster copying. It also uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. DistCp at times runs out of memory for big datasets? If the number of individual files/directories being copied from the source path(s) is extremely large, DistCp might run out of memory while determining the list of paths for copy. This is not unique to the new DistCp implementation. To get around this, consider changing the -Xmx JVM heap-size parameters, as follows: $ export HADOOP_CLIENT_OPTS="-Xms64m -Xmx1024m" $ hadoop distcp /source /target Hope that helps

Shelton · ‎07-21-2020

@saur Can you explain the latest development ? I compiled a document for you did you go through it step by step?

Shelton · ‎07-21-2020

@saur Any updates on this?

Shelton · ‎07-21-2020

@focal_fossa Can you share how method you used to extend you VM disk? Whats the VM disk file extension vmdk or vdi? Note virtualbox does not allow resizing on vmdk images. Does you disk show Dynamically allocated storage as shown below? Please revert

Shelton · ‎07-20-2020

@focal_fossa AFAIK these sandboxes dynamically allocated storage. You can try that by generate and load data for TPC-DS General usage is tpcds-setup.sh scale_factor [directory] For example below will generate 200 GB of TPC-DS data in /user/data [HDFS] ./tpcds-setup.sh 200 /user/data This should prove that the disk allocation is dynamic see below links https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build Hope that helps

Shelton · ‎07-20-2020

@rwinters Simplest method is to edit /etc/postgresql/9.2/main/start.conf and replace auto with manual or disabled. You can also resolve this by adding the following to the end of your shell's initialisation file (e.g. ~/.bashrc if you're using bash): PATH=/usr/psql-10/bin:$PATH reboot the server now you should have the postgres 10 as the default Hope that helps

Shelton · ‎07-19-2020

@tanishq1197 The Password should be the scm password Enter SCM password: The default is SCM that should progress successfully

Shelton · ‎07-18-2020

@Henry2410 MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software. On the other hand, Snowflake is detailed as "The data warehouse built for the cloud". There's not really an equivalence between MySQL and Snowflake use cases. What you are asking really is whether Snowflake can play the role of an OLTP database. Snowflake is not an OLTP database. It is an OLAP database. So generally speaking I would say no. Snowflake is a cloud-based warehouse and it would be used most of the times for OLAP purpose back to your questions, Snowflake can be used under the following conditions: If you have only inserts into target table and not much updates to the table we can achieve good performance by using cluster by and other inline views Having said that, to explore your use case a little bit more I would ask yourself or your stakeholders the following questions: Do you need millisecond response times for INSERTs, UPDATEs, and SELECTs? Does your application tool require indexes? Does your application need referential integrity and uniqueness constraints enforced? If you said yes to ANY of 1, 2, 3 then go MySQL. If you said NO to ALL 1, 2, and 3, then Snowflake might be viable. But even then I would not recommend it, as that is not what Snowflake was built for.

Shelton · ‎07-18-2020

@LeticiaAraujo Log4j properties set the properties that control logging activities for each service running in your Hadoop cluster. This is something that every site can customize according to its needs like enabling DEBUG will generate detailed by huge logs is you want to analyze the issue with your service. Rotation of the logs, compression format or logfile size etc Logging services is a topic on its own, and here is a good source for reading log4j the most common task is enabling DEBUG and Log rotation and it's date format.

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Upgrading HDP 3.1.0 to 3.1.4 : Cannot restart ...

Re: Unable to upload file into HDFS

Re: Help required to connect to Ambari for HDP San...

Re: Help required to connect to Ambari for HDP San...

Re: Increase size of HDFS.

Re: Increase size of HDFS.

Re: Ambari Server and Multiple Postgres versions?

Re: Not able to run scm_prepare_database.sh script

Re: Moving Data from MySQL to Snowflake

Re: Hive error HDP 2.6.5