About MGarg

MGarg · ‎01-25-2018

I am glad it's showing the increased values now. The following link might help, if not already referred to: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_nn_memory_config.html

MGarg · ‎01-23-2018

Services won't expire, instead license expires. Most of these services are open source, but Cloudera Management Service is not. This stands true if you are using their EDH edition of Hadoop Distro; Cloudera express edition is free, I think. As far as I know, any node that's running services except Gateway roles, Flume roles & CM need to be licensed. But if these services are colocated on a machine along with other services then that machine also needs to be licensed. I would suggest reaching out to your account rep. for further details.

MGarg · ‎01-18-2018

As far as I understand, Block Capacity means the total number of blocks HDFS can hold, irrespective of the size. For example, a file of 128MB size will consume 1 HDFS block (assuming HDFS block size is set to 128MB) from a Data Node perspective, but on the NameNode, it needs 2 namespace objects (1 for file inode and 1 block). Since all that is stored in memory, the block capacity should increave after increasing the heap size of namenode. Yes, you will have to restart HDFS and dependent services to see the increased capacity. However, it might take some time for it to reflect...

MGarg · ‎01-18-2018

Cloudera support will definitely expire. As for the cluster, I think most of the cluster services will keep working, except the licensed pieces and one will be legally non-compliant to the licensing terms.

MGarg · ‎01-18-2018

If you run the host inspector, then it will show you a detailed report about everything it found on that as well as all other hosts and from that report you can figure out what's wrong. The other option could be to restart agent on that particular node.

MGarg · ‎11-20-2017

Hi Everyone, I have a requirement to do full table loads for ~60 tables from an Oracle Database and I have a shell script that runs sqoop on each of those tables. But it takes a long time to load all those tables because some of them are huge, so I started tuning the sqoop job for each of them. However, I stumbled upon this option "--fetch-size" and I have some questions related to it: Does anyone know if it changes the "oracle.row.fetch.size" for the JDBC connection? Is there a maximum limit for this parameter? Does it impact the source DB or the Hadoop side resources? Are there any guidelines about finding an optimum value for this parameter? Thanks & Regards, Mohit Garg

MGarg · ‎07-20-2017

Thanks Tristan! I had found that mistake and corrected it. Thanks for your response. Regards, MG

MGarg · ‎07-06-2017

Hi Everyone, Not sure if anyone else faced this issue, but after much research I was able to connect to Kerberized Hive successfully. I appended "-Djavax.security.auth.useSubjectCredsOnly=false" to the "jinit" .jinit(classpath=cp, parameters="-Djavax.security.auth.useSubjectCredsOnly=false") Basically, it disables the requirement of a GSS mechanism to obtain necessary credentials from an existing Subject and allows to use the specified authentication mechanism, which in this case is Kerberos.

MGarg · ‎06-30-2017

Here's an update: I was able to fix the initial issue by adding all the jars in /opt/cloudera/parcels/CDH/... directory. However, now it's failing with Kerberos TGT not found error, although I am doing kinit before connecting. Is there something I am missing? " javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] "

MGarg · ‎06-30-2017

I am trying to access Hive tables from R by using JDBC, but it's failing while establishing the connection for org.apache.hadoop.security.UserGroupInformation I thought "hadoop-core.jar" would provide this class, but it's still failing. Does someone have any idea? library("DBI") library("rJava") library("RJDBC") cp = c("/home/cdsw/impala_jars/hadoop-common.jar", "/home/cdsw/hive_jars/libthrift-0.9.0.jar", "/home/cdsw/hive_jars/hive_service.jar", "/home/cdsw/hive_jars/hadoop-core.jar", "/home/cdsw/hive_jars/TCLIServiceClient.jar", "hive_jars/hive-jdbc-1.1.0-cdh5.10.1-standalone.jar") .jinit(classpath=cp) for(l in list.files('/home/cdsw/hive_jars')){ .jaddClassPath(paste("/home/cdsw/hive_jars",l,sep=""))} for(l in list.files('/home/cdsw/impala_jars')){ .jaddClassPath(paste("/home/cdsw/impala_jars",l,sep=""))} .jclassPath() drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "hive_jars/hive-jdbc-1.1.0-cdh5.10.1-standalone.jar", identifier.quote="`") con <- dbConnect(drv, "jdbc:hive2://localhost:10000/default;principal=hive/{HOST}@{REALM}")

Online	Offline
Last Visited	‎02-13-2020 11:03 AM

Member Since	‎02-15-2016 02:27 PM
Last Visited	‎02-13-2020 11:03 AM
Posts	33
Kudos received	6

Cloudera Community

Re: hdfs block capacity chart

Re: Unable to connect to Kerberized cluster runnin...

Re: Error in Scala/Spark Project on Cloudera Data ...

Re: hdfs block capacity chart

Re: Cloudera License

Re: hdfs block capacity chart

Re: Cloudera License

Re: One or more hosts did not report their OS Dist...

What is a reasonable value for "--fetch-size" for ...

Re: Unable to run R code to read from Hive Table

Re: Unable to connect to Kerberized cluster runnin...

Re: Unable to connect to Kerberized cluster runnin...

Unable to connect to Kerberized cluster running Hi...