About rahulsmtauti

rahulsmtauti · ‎06-25-2018

@anarasimham Thanks for your answer.

rahulsmtauti · ‎06-25-2018

We've 3 node NiFi cluster and we want to make some changes into flow which will need NiFi cluster restart. What is the correct way to restart the complete NiFi cluster? It should not be the one by one node. Thanks, R

rahulsmtauti · ‎06-12-2018

Hi, I found the correct way to do it, there is no need to do any workaround we can directly append the data into parquet hive table using saveAsTable("mytable") from spark 2.0 (Was not there in spark 1.6) Below is the code in case someone needs it. df.write.paritionedBy("mycol1","mycol2").mode(SaveMode.Append).format("parquet").saveAsTable("myhivetable") In case table is not there it will create it and write the data into hive table. In case table is there it will append the data into hive table and specified partitions.

rahulsmtauti · ‎06-11-2018

@sunile.manjee there might be multiple workaround for this, however i am not looking for workarounds. Expecting something concrete solution which should not have performance complications. We have option to write dataframe into hive table straight a way...why should we not go for that...instead of writing data into hdfs and then loading into hive table...moreover my hive table is partitioned on processing year and month...

rahulsmtauti · ‎06-11-2018

@sunile.manjee thanks for your response. Hive table has input format, output format and serde as ParquetHiveSerDe, however my concern is why files are not created with .parquet extension and whenever i do cat on those .c000 files i am unable to find parquet schema which i could find after cat of normal .parquet files.

rahulsmtauti · ‎06-11-2018

Hi, I am writing spark dataframe into parquet hive table like below df.write.format("parquet").mode("append").insertInto("my_table") But when i go to HDFS and check for the files which are created for hive table i could see that files are not created with .parquet extension. Files are created with .c000 extension. Again i am not sure whether my data is correctly written into table or not(I could see the data from hive select). How we should write the data into .parquet files into hive table? Appreciate your help on this! Thanks,

rahulsmtauti · ‎06-08-2018

Hi, I want to write spark dataframe into hive table.Hive table is partitioned on year and month and file format is parquet. Currently i am writing dataframe into hive table using insertInto() and mode("append") i am able to write the data into hive table but i am not sure that is the correct way to do it? Also while writing i am getting "parquet.hadoop.codec.CompressionCodecNotSupportedException: codec not supported: org.apache.hadoop.io.compress.DefaultCodec" this exception. Could you please help me on this? Thanks for your time,

rahulsmtauti · ‎06-06-2018

@Shu thanks for your answer,i have one doubt about your answer the command argument should not take the shell script as input instead of that we should have shell script in command path which is tried and tested.

rahulsmtauti · ‎06-06-2018

My Requirement is as soon as source put the files my spark job should get triggered out and process the files. Currently i am thinking to do like below. 1. Source will push the files in local directory /temp/abc 2. NiFi ListFiles and fetchFile will take care of ingestion of those files into HDFS. 3. On success relationship of putHDFS thinking to setup executeStreamCommand. Could you please suggest is there any best approach to do it? what will be the configuration for executeStreamCommand? Thanks in advance, R

rahulsmtauti · ‎05-31-2018

@gnovak, I am still wondering why it has created the directory on my local machine? Kind of wired... Related to this i have another issue, i am also reading files from hdfs directory using wholeTextFile() my hdfs input directory has text files and sub directories in it. On my local development machine i was able to read the files where wholeTextFile() was not considering sub directories, however whenever i deployed the same code cluster, it started to consider sub directories as well. Do you have any idea on this? Appreciate your help on this

Online	Offline
Last Visited	‎10-09-2018 11:47 AM

Member Since	‎04-11-2018 09:15 AM
Last Visited	‎10-09-2018 11:47 AM
Posts	47

Cloudera Community

Re: Write dataframe into parquet hive table ended ...

Re: Best way to restart the NiFi cluster

Best way to restart the NiFi cluster

Re: Write dataframe into parquet hive table ended ...

Re: Write dataframe into parquet hive table ended ...

Re: Write dataframe into parquet hive table ended ...

Write dataframe into parquet hive table ended with...

How to write spark dataframe into existing parquet...

Re: Trigger an spark application from NiFi

Trigger an spark application from NiFi

Re: Move file from one HDFS directoy to another us...