Member since
04-11-2018
47
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
19921 | 06-12-2018 11:26 AM |
06-25-2018
01:53 PM
We've 3 node NiFi cluster and we want to make some changes into flow which will need NiFi cluster restart. What is the correct way to restart the complete NiFi cluster? It should not be the one by one node. Thanks, R
... View more
Labels:
- Labels:
-
Apache NiFi
06-12-2018
11:26 AM
Hi, I found the correct way to do it, there is no need to do any workaround we can directly append the data into parquet hive table using saveAsTable("mytable") from spark 2.0 (Was not there in spark 1.6) Below is the code in case someone needs it. df.write.paritionedBy("mycol1","mycol2").mode(SaveMode.Append).format("parquet").saveAsTable("myhivetable")
In case table is not there it will create it and write the data into hive table. In case table is there it will append the data into hive table and specified partitions.
... View more
06-11-2018
04:08 PM
@sunile.manjee there might be multiple workaround for this, however i am not looking for workarounds. Expecting something concrete solution which should not have performance complications. We have option to write dataframe into hive table straight a way...why should we not go for that...instead of writing data into hdfs and then loading into hive table...moreover my hive table is partitioned on processing year and month...
... View more
06-11-2018
03:11 PM
@sunile.manjee thanks for your response. Hive table has input format, output format and serde as ParquetHiveSerDe, however my concern is why files are not created with .parquet extension and whenever i do cat on those .c000 files i am unable to find parquet schema which i could find after cat of normal .parquet files.
... View more
06-11-2018
02:19 PM
Hi, I am writing spark dataframe into parquet hive table like below df.write.format("parquet").mode("append").insertInto("my_table") But when i go to HDFS and check for the files which are created for hive table i could see that files are not created with .parquet extension. Files are created with .c000 extension. Again i am not sure whether my data is correctly written into table or not(I could see the data from hive select). How we should write the data into .parquet files into hive table? Appreciate your help on this! Thanks,
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
06-08-2018
10:41 AM
Hi, I want to write spark dataframe into hive table.Hive table is partitioned on year and month and file format is parquet. Currently i am writing dataframe into hive table using insertInto() and mode("append") i am able to write the data into hive table but i am not sure that is the correct way to do it? Also while writing i am getting "parquet.hadoop.codec.CompressionCodecNotSupportedException: codec not supported: org.apache.hadoop.io.compress.DefaultCodec" this exception. Could you please help me on this? Thanks for your time,
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
06-06-2018
01:40 PM
@Shu thanks for your answer,i have one doubt about your answer the command argument should not take the shell script as input instead of that we should have shell script in command path which is tried and tested.
... View more
06-06-2018
09:01 AM
My Requirement is as soon as source put the files my spark job should get triggered out and process the files. Currently i am thinking to do like below. 1. Source will push the files in local directory /temp/abc 2. NiFi ListFiles and fetchFile will take care of ingestion of those files into HDFS. 3. On success relationship of putHDFS thinking to setup executeStreamCommand. Could you please suggest is there any best approach to do it? what will be the configuration for executeStreamCommand? Thanks in advance, R
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Spark
05-31-2018
05:45 PM
@gnovak, I am still wondering why it has created the directory on my local machine? Kind of wired... Related to this i have another issue, i am also reading files from hdfs directory using wholeTextFile() my hdfs input directory has text files and sub directories in it. On my local development machine i was able to read the files where wholeTextFile() was not considering sub directories, however whenever i deployed the same code cluster, it started to consider sub directories as well. Do you have any idea on this? Appreciate your help on this
... View more