Member since
02-27-2017
171
Posts
9
Kudos Received
0
Solutions
08-11-2017
02:03 AM
I think you've got the point, the dfs.datanode.dir in Ambari is the global setting, so it will assume every host would have these dirs(/grid/data1, /grid/data2, /grid/data3), in your case you need to create config group to suite your environment. And there are two way to solve the existing data under your directory, but first let's increase the dfs.datanode.balance.bandwidthPerSec value(bytes/sec) compare to your network speed in HDFS setting by Ambari UI, this will help to speed up the progress. The safe way is to decommission DataNodes and reconfig your group setting then recommission the node one by one https://community.hortonworks.com/articles/69364/decommission-and-reconfigure-data-node-disks.html the unsafe is to reconfig the setting and remove the directory directly based on your replication setting, then wait for replicate complete by check hdfs dfsadmin -report command's under replicated blocks value to 0.
... View more
07-26-2017
05:47 AM
@rahul gulati I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.
... View more
06-14-2017
07:08 AM
hi @rahul gulati, Apparently, number of
partitions for your DataFrame / RDD is creating the issue. This can be controlled by adjusting
the spark.default.parallelism parameter in spark context or by using
.repartition(<desired number>) When you run in spark-shell
please check the mode and number of cores allocated for the execution and
adjust the value to which ever is working for the shell mode Alternatively you can observe
the same form Spark UI and come to a conclusion on partitions. # from spark website on spark.default.parallelism For distributed shuffle
operations like reduceByKey and join, the largest number of
partitions in a parent RDD. For operations like parallelize with no
parent RDDs, it
depends on the cluster manager:
Local mode: number of cores on the local machine
Others: total number of cores on all executor
nodes or 2, whichever is larger
... View more
05-10-2017
03:31 AM
Using HDP 2.5 with Spark 2 If you defind the code as follows: val spark = SparkSession
.builder
.appName("my app")
.getOrCreate() import spark.implicits._ val test = spark.sqlContext.sql("select max (test_dt) as test_dt from abc").as[String] val test1 = spark.sqlContext.table("testing") The following two statements will compile val output2 = test1.filter(test1("audit_date").gt(test).toString()) val output2 = test1.filter(test1("audit_date").gt(test)) of course you can always convert test to String and use the variable in the filter clause.
... View more
05-05-2017
05:28 PM
@rahul gulati
Yes, it's okay to install MIT KDC on Ambari server node. But in the
real production cluster, we should clearly separate these two roles on
two different nodes. Hope this helps !
... View more
04-11-2017
04:58 PM
Hello @rahul gulati, It's not a good community practice to ask similar question mutliple times. I've provided answer in the comments here, please check.
... View more
04-03-2017
11:53 AM
1 Kudo
@rahul gulati this is how I connect to hive via knox through beeline: beeline --silent=true -u "jdbc:hive2://<knox_host>:8443/;ssl=true;sslTrustStore=/usr/hdp/current/knox-server/data/security/keystores/gateway.jks;trustStorePassword=knoxsecret;transportMode=http;httpPath=gateway/default/hive;hive.server2.use.SSL=true" -d org.apache.hive.jdbc.HiveDriver -n sam -p sam-password and there are few references too: https://cwiki.apache.org/confluence/display/KNOX/Examples+Hive https://community.hortonworks.com/questions/16887/beeline-connect-via-knox-ssl-issue.html
... View more
02-12-2019
07:02 AM
@Priyanka This is a closed thread (2017) can you open a new one and copy past this content.
... View more
03-21-2019
04:58 AM
Is there some example to create Hive table to show up files of the hdfs_path entity? Thank you.
... View more
09-05-2018
12:05 PM
@Artem Ervits
> Both can add and remove instances as well as provision new instances with new machine type easily.
Could you please point where that option could be located in the UI or CLI of Cloudbreak? Thank you!
... View more