Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11986 | 04-15-2020 05:01 PM | |
| 7952 | 10-15-2019 08:12 PM | |
| 3593 | 10-12-2019 08:29 PM | |
| 12979 | 09-21-2019 10:04 AM | |
| 4845 | 09-19-2019 07:11 AM |
09-09-2019
07:54 PM
@ANMAR Try with this regex in ExtractText processor. (?:"x":.\w+?)(\d+) This regex will extract only the digit in "x" key and adds that value for "y" key in ReplaceText processor.
... View more
09-08-2019
08:14 PM
1 Kudo
@ANMAR You need to use ExtractText processor and matching regex to extract only the integer value. --------------------------------------------- Add new property in ExtractText processor val (\d+) - Then use ReplaceText processor with below configs: Search Value } Replacement Value ,"y":"${val}"} Character Set UTF-8 Maximum Buffer Size 1 MB Replacement Strategy Literal Replace Evaluation Mode Entire text - By using Replacetext processor we are extracting the value and adding "y" key with the extracted value. -------------------------------------------- Input data: {"x":"avc123.abc.com"} Output: {"x":"avc123.abc.com","y":"123"}
... View more
09-06-2019
11:16 PM
@RandomT You can check compression on .avro files using avro-tools bash$ avro-tools getmeta <file_path> For more details refer to this link - sqlContext.setConf //sets global config and every write will be snappy compressed if you are writing all your data as snappy compressed then you should use this method. - In case if you are compressing only the selected data then use exampleDF.write.option("compression", "snappy").avro("output path") for better control over on compression.
... View more
07-31-2019
02:39 AM
@Erkan ŞİRİN, Try specifying defaultFS,resourcemanager address val spark = SparkSession.builder().master("yarn")
.config("spark.hadoop.fs.defaultFS","<name_node_address>")
.config("spark.hadoop.yarn.resourcemanager.address","<resourcemanager_address>")
.app_name("<job_name>")
.enableHiveSupport()
.getOrCreate() and then add spark-yarn_x.x.jar to maven repository and try to run again.
... View more
07-28-2019
10:45 PM
@Erkan ŞİRİN Did you try using yarn-client (or) yarn-client instead of yarn in .master. If error still exists then add spark-yarn.jar to the build path, then try to submit the job again. Refer to this link for more details about similar issue.
... View more
07-25-2019
03:32 AM
@Shailuk Could you give password in ListSFTP processor and then try to run the processor again?
... View more
07-23-2019
03:42 AM
1 Kudo
@Shailuk Schedule GetSFtp processor to run on Primary node with Run Schedule as 0 Sec then processor will try to run everypossible sec and pulls the file from configured directory. **NOTE** if we don't delete the file from the path then GetSFTP processor will pull the same file again and again because GetSFTP processor doesn't store the state. Correct Approach: Use ListSFTP + FetchSFTP processors and configure ListSFTP processor to run on primary node with Run schedule as 0 sec and this processor stores the state and runs incrementally by listing out only the newly added files in the directory. FetchSFTP processor fetches the files from the directory and then use PutFile processor to store the files into Local machine.
... View more
07-18-2019
04:15 PM
@Duraisankar S If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
... View more
07-18-2019
03:25 AM
@Duraisankar S You can run Major compaction on the partition in Hive, after the major compaction is done base-**** directory will be created. Then spark able to read the specific partition which have base-*** directories in it. But spark not able to read delta directories as there is an open Jira [SPARK-15348] about spark is not able to read acid table. I think starting from HDP-3.X HiveWareHouseConnector is able to support to read HiveAcid tables.
... View more
07-12-2019
02:07 PM
@Sampath Kumar As you have enabled Ranger authorization then DFS commands are restricted in Hive when authorization is enabled.
... View more