Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11230 | 04-15-2020 05:01 PM | |
| 7135 | 10-15-2019 08:12 PM | |
| 3120 | 10-12-2019 08:29 PM | |
| 11505 | 09-21-2019 10:04 AM | |
| 4346 | 09-19-2019 07:11 AM |
10-28-2018
04:29 PM
@Lenu K We can do export to Hive ORC as follows: hive> Create table <db_name>.<orc_table_name> stored as orc as select * from <db_name>.<hbase_hive_table>; The above CTAS is generic statement even you can create a partitioned table (or) use distribute by sort by to create files in the directories.
... View more
10-28-2018
02:05 PM
@vivek jain Could you make hbaseConf.set properties changes directly in Spark hbase-site.xml file instead of setting those property values in spark job and then run spark-submit with newly changed hbase-site.xml?
... View more
10-28-2018
01:52 PM
@Lenu K One way to avoid full table scans is by using RowKey in your hive filter query and if you are filtering out another columns(not only row key) then it would be a lot more efficient if you export all HBase table data into Hive-ORC table then run all your queries on the exported table. Refer to this and this links for tuning up the Queries in case of HBase-Hive table. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer.
... View more
10-24-2018
12:57 PM
@Sandip Dobariya Try with this configurations in ReplaceText processor and this configs, we are only applying nifi expression language replace function on first line of the extracted content not on all lines and this expression replaces spaces in first line. Search Value: (?s)(^[^\n]*)(.*$) Replacement Value: ${'$1':replace(" ","")}$2 Replacement Strategy RegexReplace Evaluation Mode Entire text Input: Date,Location,Name,Manager,Division,Revenue,PVNMS OnLine,NMS LHSC,RMS OnLineRMS LHSC Output: Date,Location,Name,Manager,Division,Revenue,PVNMSOnLine,NMSLHSC,RMSOnLineRMSLHSC
... View more
10-24-2018
12:44 PM
1 Kudo
@HENI MAHER
Use Update Record processor and use concat function on Date and TIME attributes then in Record Writer avro schema don't mention the original attributes. Refer to this link for more details regards to Update Record processor. (or) 1.Another way is to extract the attribute value from the content and keep as Flowfile Attribute(using ExtractText,EvaluateJson..etc) processors then 2.Use Update Attribute processor and add new property in update attribute processor as ${Date}${TIME} in Delete attributes list property value add your original attribute name.
... View more
10-23-2018
11:22 PM
1 Kudo
@Sandip Dobariya If your csv file size is not huge then you can use on of the mentioned way in this link. (or) By using record oriented processor(ConvertRecord..etc) which more efficient way of doing this task. Configure the ConvertRecord processor with CsvReader and CsvWriter controller service(include header line 'False'). then use ReplaceText processor to prepend the header line to the csv file. Refer to this link for more reference.
... View more
10-23-2018
09:47 PM
@Carlos Cardoso There is AccessControl exception in your shared logs.. org.apache.hadoop.hive.metastore.api.MetaException: org.apache.hadoop.security.AccessControlException: Permission denied: user=nifi, access=EXECUTE, inode="/warehouse/tablespace/managed/hive":hive:hadoop:drwx------ Make sure nifi user having appropriate permission on this directory "/warehouse/tablespace/managed/hive" access to the directory and try to ingest data into table again. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues
... View more
10-23-2018
09:41 PM
1 Kudo
@NARENDRA KALLI Try with escaping double back slash(\\) $ hadoop fs -rm -r /hdfspath/test/\\$\\{\\db\\}
... View more
10-20-2018
04:34 PM
@Carlton Patterson
This is not possible with default save/csv/json functions but using Hadoop API we can rename the filename. Example: >>> df=spark.sql("select int(1)id,string('ll')name") //create a dataframe
>>> df.coalesce(1).write.mode("overwrite").csv("/user/shu/test/temp_dir") //writing the df to temp-dir
>>> from py4j.java_gateway import java_import
>>> java_import(spark._jvm, 'org.apache.hadoop.fs.Path')
>>> fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())
>>> file = fs.globStatus(sc._jvm.Path('/user/shu/test/temp_dir/part*'))[0].getPath().getName() //get the filename of temp_dir
>>> fs.rename(sc._jvm.Path('/user/shu/test/temp_dir/' + file),sc._jvm.Path('/user/shu/test/mydata.csv')) //rename the temp directory file with desired filename and directory path
>>> fs.delete(sc._jvm.Path('/user/shu/test/temp_dir'), True) //delete the temp directory. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
10-19-2018
01:56 AM
@Nisha Patel If you look into configs of PublishKafkaRecord processor there is Record Reader/Writer controller services, so if your Record Writer is CsvSetWriter then you have configured Include Heder Line property value as true. i.e on each record you are writing the header so when we use Merge Content processor you are going to have header lines included for each record. . To resolve this issue Change the Include Heder Line value to False(now we are not writing header to each record) and then in Merge Content processor keep the Header property value as your header. So by following this way after merging completes then processor adds Header to the file. . Is there a specific reason why are you using PublishKafkaRecord processor? You can even use PublishKafka processor(because as you are splitting each record so there is no need to use Record oriented processors in this case unless if you have some valid reason.) which doesn't require any Record reader/writer controller services, so the message that we published into Kafka topic will be routed to Success relationship. Then use Merge Content processor to Merge all these flowfiles into one and then add the Header to the merged file. Flow: Replace PublishKafkaRecord processor with PublishKafka processor MergeContent processor
... View more