About saranvisa

GeKas · ‎04-17-2018

This is a mis-leading of the "free" output. The first line (starting with "Mem") displays that you have 62G of memory and 56G are used. This memory is used but not from procesess. At the end of the line, you will see a number of 39G cached. In few words, Linux uses a part of free RAM to store data from files used often, in order to save some interactions with the hard disk. Once an application request memory and there is no "free", Linux automatically drops these caches. You cannot turn this feature off. The only thing you can do is just drop the current cached data, but Linux will store something the very next second. In any case, when the output of "free" is similar to the one you provided, you should always refer to the second line "-/+ buffers/cache: 16G 49G" This is the real status, which show "16G" used and "49G" free. Finally, CM displays the disk and memory usage of the host (in Hosts view) regardless of what process is using it. It is the same output as "free".

RajeshBodolla · ‎04-12-2018

@denole Yes, i agree with your point. my notion of this was to test when we have quickstart VM for testing.

diogobaltazar · ‎04-10-2018

Is there a way to change hdfs permissions when importing files via sqoop? https://stackoverflow.com/questions/49759591/sqoop-how-to-change-hdfs-permissions-on-imported-files

subhrajit · ‎02-15-2018

Hi, I have follwed all the steps to install Spark2 in cloudera vm but getting following error when I am trying to Host spark2 parcel. CDH (5.8 and higher) parcel required for SPARK2 (2.2.0.cloudera2-1.cdh5.12.0.p0.232957) is not available. I am using CDH version : 5.12 and using http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera2/ link for parcel. Please help me with the configurtion.

csguna · ‎02-13-2018

@Cloudera learning - Did you had a chance to raise the datanode bandwidh , Datanode heapsize , increase the replication work multiplier before kicking of the decommision . this will certainly increase the performance. Also if your decommision is runining for ever i would suggest you to commission it back and perform decommision it again. -

axie · ‎02-12-2018

I am doing some practices, for some questions, there could be various solutions, for example, I can use RDD operations to do some filtering, sorting, and grouping; with DataFrame and SparkSQL, it is even easier to me to get the same result. My question is will there be a requirement in the exam that some questions must be resolved using RDD, not DataFrame+SparkSQL. or vice versa? Thank you.

axie · ‎02-03-2018

Thank you.

saranvisa · ‎02-01-2018

@NewBee22 For your first question, Go to Cloudera Manager -> HDFS -> Configuration -> search for 'Log Directory' -> change the dir path wherever applicable For your second question, Go to Cloudera Manager -> HDFS -> Configuration -> search for 'maximum log' -> Here you can change both Max log file size & Number of log files to retain. Ex: Maximum Audit Log File Size - 100 Mib (You can reduce the size) Number of Audit Logs to Retain - 10 (you can reduce how many logs to retain) Finally, you can do this for all the applicable services like yarn, etc

ludof · ‎01-25-2018

Thank you for the answer The JAVA_HOME is not set, and I've installed the jdk version from the CDH repository. Then I guess the Cloudera Manager also installed the jdk 6 during the isntallation/configuration phase. So you recommend to uninstall jdk 6 and 7 and install the latest supported jdk on each host?

ludof · ‎01-22-2018

Hi bgooley, yhank you for the answer! I've set the Sqoop 1 Client Gateway and deployed the configuration. The Teradata Connector parcel was already activated during the cluster configuration phase. What puzzles me is that in the Cloudera documentation, for the manual installation, says that the following property must be added to the sqoop-site.xml file in order to use the Teradata connector with Oozie: <configuration> <property> <name>sqoop.connection.factories</name> <value>com.cloudera.connector.teradata.TeradataManagerFactory</value> </property> <configuration> I've followed the teradata connector installation path trough Cloaudera Manager, thus I was expecting that the property had been injected automatically by the CM, but it seems missing in the sqoop-site.xml files: /run/cloudera-scm-agent/process/ccdeploy_sqoop-conf_etcsqoopconf.cloudera.sqoop_client_1328941158299821742/sqoop-conf/sqoop-site.xml <?xml version="1.0" encoding="UTF-8"?>  <configuration> <property> <name>sqoop.connection.factories</name> <value></value> </property> <property> <name>sqoop.tool.plugins</name> <value></value> </property> </configuration> /etc/sqoop/conf.cloudera.sqoop_client/sqoop-site.xml <?xml version="1.0" encoding="UTF-8"?>  <configuration> <property> <name>sqoop.connection.factories</name> <value></value> </property> <property> <name>sqoop.tool.plugins</name> <value></value> </property> </configuration> In this case should I inject it manually from the configuration in Cloudera Manager web console as explained from @saranvisa? Thanks for the help

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: Cloudera Manager difference in physical memory...

Re: Multiple datanodes on the same machine?

Re: Impala SQL: Unable to LOAD DATA from HDFS path...

Re: Installing Spark2

Re: time for decommission a data node

Re: CCA 175 Revised Syllabus

Re: How to list all sqoop jobs and view the detail...

Re: how to change log folder locations and log siz...

Re: Which version of java is used in CM-cluster?

Re: How configuration files are used in in a Cloud...