Member since
08-03-2019
186
Posts
34
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1959 | 04-25-2018 08:37 PM | |
5882 | 04-01-2018 09:37 PM | |
1593 | 03-29-2018 05:15 PM | |
6766 | 03-27-2018 07:22 PM | |
2007 | 03-27-2018 06:14 PM |
04-01-2018
03:05 PM
1 Kudo
@Selvaraju Sellamuthu Try using the following properties to control the mapper count for your job. set tez.grouping.min-size=16777216; --16 MB min splitset
tez.grouping.max-size=1073741824; --1 GB max split These parameters will control the number of mappers for splittable formats with Tez. Please update your results after using these properties for your execution.
... View more
03-31-2018
05:31 AM
@Félicien Catherin If I understand your question properly, you want to check all the attributes of a flow file and then take some action on that attribute. For this, you can use getAttributes() function in your script. This will return you a map with attribute name as key and attribute value as value. For example flowFile = session.get()
attrMap = flowFile.getAttributes() You can iterate on the map to check if a certain property exists or not or whatever actions you may want to take. Hope that helps!
... View more
03-31-2018
05:22 AM
Please mark the answer as accepted if it resolved your problem. This way you can help other community users having similar issues identify the resolution faster.
... View more
03-30-2018
10:33 AM
No problem.if your existing approach is not working. Please.share.some.info regarding your convert record processor etc and same dataset it is passing for and is failing for I can look in further
... View more
03-30-2018
03:05 AM
@Krishna R In your Hive terminal, set the following properties set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec; This will enable the compression and will set the compression codec, gzip in this case. Now you can insert the data into an HDFS directory and the output will be in gzip format. insert overwrite directory 'myHDFSDirectory' row format delimited fields terminated by ',' select * from myTable; This will store the output of my select * query in the HDFS directory. Let know if that works for you.
... View more
03-30-2018
02:55 AM
@Stefan Constantin If you are using NiFi 1.2+, I would highly recommend NOT using EvaluateJSONPath. Now talking about the alternate approach, which is using ConvertRecord where you are facing arrayIndexOutOfBoundException, the problem probably is with your schema. Use default null values for your columns in your schema for ConvertRecord Processor like. {
"type":"record",
"name":"nifi_logs",
"fields":[
{"name":"column1","type":["null","string"]},
{"name":"column2","type":["null","string"]},
{"name":"column3","type":["null","string"]},
{"name":"column4","type":["null","string"]}
]
} Try this and let know if you still face any problems. Cheers!
... View more
03-29-2018
05:15 PM
@subbiram Padala I don't think so! The certification is purely going to be based on your Spark skills and external jars are generally omitted. Also, if there is any such requirement, this would be explicitly mentioned that you need to store the header. And that I mentioned earlier, may not be the case. Keep the spirits up! All the best with your exam.
... View more
03-29-2018
05:41 AM
Have you seen the filter condition in my answer above? val rdd = data.filter(row => row != header) Now use such filter condition to filter your null records, if there are any, according to your use case.
... View more
03-28-2018
03:26 PM
@swathi thukkaraju You can do it without using CSV package. Use the following code. import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{IntegerType,StringType,StructField,StructType}
val schema =new StructType().add(StructField("name",StringType,true)).add(StructField("age",IntegerType,true)).add(StructField("state",StringType,true))
val data = sc.textFile("/user/206571870/sample.csv")
val header = data.first()
val rdd = data.filter(row => row != header)
val rowsRDD = rdd.map(x => x.split(",")).map(x => Row(x(0),x(1).toInt,x(2)))
val df = sqlContext.createDataFrame(rowsRDD,schema) After this, do df.show and you will be able to see your data in a relational format. Now you can fire whatever queries you want to fire on your "DataFrame". For example, filtering based on state and saving on HDFS etc. PS - If you want to persist your DataFrame as a CSV file, spark 1.6 DOES NOT support it out of the box, you either need to convert it to RDD, then save or use the CSV package from DataBricks. Let know if that helps!
... View more
03-27-2018
09:30 PM
@Manikandan Jeyabal What are the Spark and Hive versions? If you have Hive 2.x and Spark version below 2.2, this is a known issue and was fixed in Spark 2.2 Here is the Jira Link .
... View more