About Shu_ashu

Shu_ashu · ‎09-09-2019

@ANMAR Try with this regex in ExtractText processor. (?:"x":.\w+?)(\d+) This regex will extract only the digit in "x" key and adds that value for "y" key in ReplaceText processor.

Shu_ashu · ‎09-08-2019

@ANMAR You need to use ExtractText processor and matching regex to extract only the integer value. --------------------------------------------- Add new property in ExtractText processor val (\d+) - Then use ReplaceText processor with below configs: Search Value } Replacement Value ,"y":"${val}"} Character Set UTF-8 Maximum Buffer Size 1 MB Replacement Strategy Literal Replace Evaluation Mode Entire text - By using Replacetext processor we are extracting the value and adding "y" key with the extracted value. -------------------------------------------- Input data: {"x":"avc123.abc.com"} Output: {"x":"avc123.abc.com","y":"123"}

Shu_ashu · ‎09-06-2019

@RandomT You can check compression on .avro files using avro-tools bash$ avro-tools getmeta <file_path> For more details refer to this link - sqlContext.setConf //sets global config and every write will be snappy compressed if you are writing all your data as snappy compressed then you should use this method. - In case if you are compressing only the selected data then use exampleDF.write.option("compression", "snappy").avro("output path") for better control over on compression.

Shu_ashu · ‎07-31-2019

@Erkan ŞİRİN, Try specifying defaultFS,resourcemanager address val spark = SparkSession.builder().master("yarn") .config("spark.hadoop.fs.defaultFS","<name_node_address>") .config("spark.hadoop.yarn.resourcemanager.address","<resourcemanager_address>") .app_name("<job_name>") .enableHiveSupport() .getOrCreate() and then add spark-yarn_x.x.jar to maven repository and try to run again.

Shu_ashu · ‎07-28-2019

@Erkan ŞİRİN Did you try using yarn-client (or) yarn-client instead of yarn in .master. If error still exists then add spark-yarn.jar to the build path, then try to submit the job again. Refer to this link for more details about similar issue.

Shu_ashu · ‎07-25-2019

@Shailuk Could you give password in ListSFTP processor and then try to run the processor again?

Shu_ashu · ‎07-23-2019

@Shailuk Schedule GetSFtp processor to run on Primary node with Run Schedule as 0 Sec then processor will try to run everypossible sec and pulls the file from configured directory. **NOTE** if we don't delete the file from the path then GetSFTP processor will pull the same file again and again because GetSFTP processor doesn't store the state. Correct Approach: Use ListSFTP + FetchSFTP processors and configure ListSFTP processor to run on primary node with Run schedule as 0 sec and this processor stores the state and runs incrementally by listing out only the newly added files in the directory. FetchSFTP processor fetches the files from the directory and then use PutFile processor to store the files into Local machine.

Shu_ashu · ‎07-18-2019

@Duraisankar S If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

Shu_ashu · ‎07-18-2019

@Duraisankar S You can run Major compaction on the partition in Hive, after the major compaction is done base-**** directory will be created. Then spark able to read the specific partition which have base-*** directories in it. But spark not able to read delta directories as there is an open Jira [SPARK-15348] about spark is not able to read acid table. I think starting from HDP-3.X HiveWareHouseConnector is able to support to read HiveAcid tables.

Shu_ashu · ‎07-12-2019

@Sampath Kumar As you have enabled Ranger authorization then DFS commands are restricted in Hive when authorization is enabled.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: NIFI JSON event extraction only integers from ...

Re: NIFI JSON event extraction only integers from ...

Re: Setting Compression

Re: Could not parse Master URL: 'yarn'

Re: Could not parse Master URL: 'yarn'

Re: Trigger GetSFTP processor

Re: Trigger GetSFTP processor

Re: unable to view the content of acid table i...

Re: unable to view the content of acid table i...

Re: Unable to run [DFS] command in beeline