Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3351 | 10-18-2017 10:19 PM | |
3713 | 10-18-2017 09:51 PM | |
13419 | 09-21-2017 01:35 PM | |
1386 | 08-04-2017 02:00 PM | |
1846 | 07-31-2017 03:02 PM |
07-19-2017
05:38 PM
@Suhel How many users are connecting to your HiveServer 2 concurrently? That determines your memory. From Hortonworks recommendations, for 20 concurrent users you need a mere 6 GB. If you have 10 concurrent connections, 4 GB is enough. For single connection 2 GB, so definitely you don't wont to go below that. When you have too much memory, you run into what's called "Stop the world garbage collection pauses". You can google more on this but basically JVM needs to move object and update references to it. Now if you move object before updating the references and application that is running access it from old reference than there is trouble. if you update reference first and than try to move object the updated reference is wrong till object is moved and any access while object has not moved will cause issue. For both CMS and Parallel collector the young generation collection algorithm is similar and it is stop the world that is, application is stopped when collection is happening. When you allocate too much memory, like 24 GB, stop the world takes longer time, hence your application fails. So, your metastore does not need to have same memory as Hive Server 2. They are two different processes. If metastore is also running into similar issues, you can set it to 8 GB or less - that's still a lot of memory for just Metastore.
... View more
07-19-2017
05:21 PM
1 Kudo
@Bala Vignesh N V Why not use filter like the following? val header = data.first val rows = data.filter(line => line != header)
... View more
07-19-2017
05:16 PM
@Jobin George On your new node, do you have flow.xml.gz? If yes, can you delete it and try adding the node again.
... View more
07-18-2017
05:39 PM
Please see the following link. In your code, you'll need to do a "repartition". What I am trying to say is if you force more data to same reducer, you will create less files. Call repartition function on some key where data for that key will land in same partition. https://dzone.com/articles/optimize-spark-with-distribute-by-cluster-by
... View more
07-18-2017
02:10 AM
@Krishna S To use these components without HDFS, you need a file system that supports Hadoop API. Some such systems are Amazon S3, WASB, EMC Isilon and a few others(these systems might not implement 100 percent of Hadoop API - please verify). you can also install Hadoop in standalone mode which does not use HDFS. I am not sure NFS on its own supports Hadoop API but using Hadoop NFS gateway, you can mount HDFS as client's local file system. Here is a link on using this feature. https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.htm
... View more
07-18-2017
01:46 AM
na Also use DISTRIBUTE BY so data for same partition goes to same reducer.
... View more
07-18-2017
01:33 AM
2 Kudos
@Upendra N I think you probably realize what makes SCD type 2 difficult in Hadoop (hive/Pig) is that you cannot update records (With new Hive ACID you can but under the hood its doing the magic, that you can also do your self). Rather than reprinting the process here, here is one link that describes implementing doing SCD Type 2 in Hadoop using Hive. Hope this helps. https://www.softserveinc.com/en-us/tech/blogs/process-slowly-changing-dimensions-hive/
... View more
07-10-2017
08:32 PM
@Prakhar Agrawal In your code you have only two properties. Where are you handling in your processor code these additional properties that you are getting error for, for example, "databaseName"? I don't see this property in your code. static final PropertyDescriptor MyPropertyDescriptor = new PropertyDescriptor.Builder()
.name("Print User Input")
.description("It prints the user input")
.required(true)
.build();
static final PropertyDescriptor n = new PropertyDescriptor.Builder()
.name("Num Rows to Print")
.description("number of rows to be printed")
.required(true)
.build();
... View more
07-10-2017
05:22 PM
@Prakhar Agrawal Can you please share your code for PropertyDescriptors in your custom processor and how you are handling it in "OnTriger()" method?
... View more
07-07-2017
03:55 AM
@Karan Alang I have not done this but it seems like that collector is already there, at least by that name. Can you change the name and try it?
... View more