About munnyrahul

1qaz5222 · ‎08-11-2017

I think you've got the point, the dfs.datanode.dir in Ambari is the global setting, so it will assume every host would have these dirs(/grid/data1, /grid/data2, /grid/data3), in your case you need to create config group to suite your environment. And there are two way to solve the existing data under your directory, but first let's increase the dfs.datanode.balance.bandwidthPerSec value(bytes/sec) compare to your network speed in HDFS setting by Ambari UI, this will help to speed up the progress. The safe way is to decommission DataNodes and reconfig your group setting then recommission the node one by one https://community.hortonworks.com/articles/69364/decommission-and-reconfigure-data-node-disks.html the unsafe is to reconfig the setting and remove the directory directly based on your replication setting, then wait for replicate complete by check hdfs dfsadmin -report command's under replicated blocks value to 0.

ssubhas · ‎07-26-2017

@rahul gulati I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.

bkosaraju · ‎06-14-2017

hi @rahul gulati, Apparently, number of partitions for your DataFrame / RDD is creating the issue. This can be controlled by adjusting the spark.default.parallelism parameter in spark context or by using .repartition(<desired number>) When you run in spark-shell please check the mode and number of cores allocated for the execution and adjust the value to which ever is working for the shell mode Alternatively you can observe the same form Spark UI and come to a conclusion on partitions. # from spark website on spark.default.parallelism For distributed shuffle operations like reduceByKey and join, the largest number of partitions in a parent RDD. For operations like parallelize with no parent RDDs, it depends on the cluster manager: Local mode: number of cores on the local machine Others: total number of cores on all executor nodes or 2, whichever is larger

abbtah · ‎05-10-2017

Using HDP 2.5 with Spark 2 If you defind the code as follows: val spark = SparkSession .builder .appName("my app") .getOrCreate() import spark.implicits._ val test = spark.sqlContext.sql("select max (test_dt) as test_dt from abc").as[String] val test1 = spark.sqlContext.table("testing") The following two statements will compile val output2 = test1.filter(test1("audit_date").gt(test).toString()) val output2 = test1.filter(test1("audit_date").gt(test)) of course you can always convert test to String and use the variable in the filter clause.

VR46 · ‎05-05-2017

@rahul gulati Yes, it's okay to install MIT KDC on Ambari server node. But in the real production cluster, we should clearly separate these two roles on two different nodes. Hope this helps !

VR46 · ‎04-11-2017

Hello @rahul gulati, It's not a good community practice to ask similar question mutliple times. I've provided answer in the comments here, please check.

dsharma · ‎04-03-2017

@rahul gulati this is how I connect to hive via knox through beeline: beeline --silent=true -u "jdbc:hive2://<knox_host>:8443/;ssl=true;sslTrustStore=/usr/hdp/current/knox-server/data/security/keystores/gateway.jks;trustStorePassword=knoxsecret;transportMode=http;httpPath=gateway/default/hive;hive.server2.use.SSL=true" -d org.apache.hive.jdbc.HiveDriver -n sam -p sam-password and there are few references too: https://cwiki.apache.org/confluence/display/KNOX/Examples+Hive https://community.hortonworks.com/questions/16887/beeline-connect-via-knox-ssl-issue.html

Shelton · ‎02-12-2019

@Priyanka This is a closed thread (2017) can you open a new one and copy past this content.

figo1984 · ‎03-21-2019

Is there some example to create Hive table to show up files of the hdfs_path entity? Thank you.

ilgar · ‎09-05-2018

@Artem Ervits > Both can add and remove instances as well as provision new instances with new machine type easily. Could you please point where that option could be located in the UI or CLI of Cloudbreak? Thank you!

Online	Offline
Last Visited	‎04-12-2017 01:23 AM

Member Since	‎02-27-2017 05:00 AM
Last Visited	‎04-12-2017 01:23 AM
Posts	171
Kudos received	9

Cloudera Community

Re: Infrastructure Architecture for HDFS/Hadoop

Re: Handle New line character in fixed width files...

Re: Error in Spark Application - Missing an output...

Re: Compare 2 dataframes and filter results based ...

Re: Kerberos Install on HDP cluster after Ranger/R...

Re: Specifying multiple values in property dfs.clu...

Re: Authentication Issue in Apache Knox ldap

Re: Ranger Install on HDP 2.5 using Ambari

Re: HDFS_PATH in Apache Atlas Search

Re: Guidelines on Replacing Nodes in Hadoop cluste...