Member since
10-06-2017
11
Posts
0
Kudos Received
0
Solutions
01-22-2018
10:37 AM
Cluster Information: 10 Node cluster, each machine has 16 cores and 126.04 GB of RAM My Question how to pick num-executors, executor-memory, executor-core, driver-memory, driver-cores Job will run using Yarn as resource schdeuler
... View more
Labels:
- Labels:
-
Apache Spark
11-09-2017
08:49 AM
I think it won't same performance . Consider below two statements, Let say I have a DataFrame with columns a,b,c,d,e. I want to select only a,b,c 1) df.select("a","b","c") In this , it will only select the required columns and never bothered about other columns 2) df.select.("a","b","c","d","e").drop("d","e") In this First it select all the columns from DataFrame and then drops the unwanted columns. I think in second statement there will be some performance down.. If Iam wrong..please clarify it
... View more
11-07-2017
11:29 AM
I didn't .. I will go through it..
... View more
11-07-2017
10:33 AM
Is there a way to pass the properties of processor dynamically using rest api call Let say I have a Processor named ListFile. For this We have lot of properties to set, But what I want is the following propertie Input Directory need to set using rest api ..
... View more
Labels:
- Labels:
-
Apache NiFi
11-02-2017
08:01 AM
thanks matt.. Fine that works, But the thing is that what if my table has 100 millions rows , whether the ExecuteSQL process pulls the entire rows in single execution or not . if it doesn't pull the all rows in one execution, then I can't use the above suggestion write. is there way to tackle this?
... View more
11-01-2017
11:33 AM
okay thank you.. one thing is there any way to execute ExecuteSQL Process only once..because it returning duplicate rows
... View more
11-01-2017
10:31 AM
At the outset, thank you so much for quick reply. We want to build a NiFi job where we will pass table name and it should list all the columns of that table. Further, we will filter few columns and will store required column data to flat file. This process should work for any number of tables irrespective of schema. We are looking to build such generic NiFi Processor.
... View more
11-01-2017
09:40 AM
Thanks you for the reply.. But Processors related to Databases will give output in AvroFormat write. I have gone through the InferAvroSchema process, for it input should be in json or csv file ..Then I am thinking this is possible or not.
... View more
11-01-2017
08:31 AM
The problem is need to fetch the table schema for a table. Then after fetching the schema, I need to build the select query from the schema.This is because, it should work any table with any number of columns. The idea is to build a generic processor to extract records from a table, irrespective of table schema. Instead of desiging different workflows for different table.
... View more
Labels:
- Labels:
-
Apache NiFi
10-06-2017
10:41 AM
you can use delete(outputpath,true) method which takes two arguments Example Path path = new Path(outputDirectory); path.getFileSystem(conf).delete(path,true);
... View more