About Shu_ashu

Shu_ashu · ‎09-09-2019

@ANMAR Try with this regex (?:\"key2\"\\s*:\\s*)(.*?), This will extract only the value of "key2" key i.e "value2" - if you don't need quotes to be extracted then use this regex (?:\"key2\"\\s*:\\s*)"(.*?)", This will extract only the value of "key2" key i.e value2

Shu_ashu · ‎09-09-2019

@ANMAR Try with this regex in ExtractText processor. (?:"x":.\w+?)(\d+) This regex will extract only the digit in "x" key and adds that value for "y" key in ReplaceText processor.

Shu_ashu · ‎09-08-2019

@ANMAR You need to use ExtractText processor and matching regex to extract only the integer value. --------------------------------------------- Add new property in ExtractText processor val (\d+) - Then use ReplaceText processor with below configs: Search Value } Replacement Value ,"y":"${val}"} Character Set UTF-8 Maximum Buffer Size 1 MB Replacement Strategy Literal Replace Evaluation Mode Entire text - By using Replacetext processor we are extracting the value and adding "y" key with the extracted value. -------------------------------------------- Input data: {"x":"avc123.abc.com"} Output: {"x":"avc123.abc.com","y":"123"}

Shu_ashu · ‎09-06-2019

@RandomT You can check compression on .avro files using avro-tools bash$ avro-tools getmeta <file_path> For more details refer to this link - sqlContext.setConf //sets global config and every write will be snappy compressed if you are writing all your data as snappy compressed then you should use this method. - In case if you are compressing only the selected data then use exampleDF.write.option("compression", "snappy").avro("output path") for better control over on compression.

Shu_ashu · ‎08-12-2019

@Raymond Cui Try with adding new attribute in UpdateAttribute processor as epochtime ${file.creationTime:toDate("yyyy-MM-dd'T'HH:mm:ss+0000"):toNumber()} Then nifi will match the format and convert to epoch time.

Shu_ashu · ‎08-06-2019

@Satish Karuturi This is an expected behaviour from ExecuteStreamCommand processor and best practice is to place shell script file on all nodes of NiFi cluster. As you are having 2 node nifi cluster and we are not able to control which will be primary node and ExecuteStreamCommand processor will run only on primary node. In case of primary node changes nifi will pick the shell script from the active primary node and continue to execute the script without any issues..! In addition you can also use ExecuteProcess processor to execute Shell script in NiFi.

Shu_ashu · ‎07-31-2019

@Rohini Mathur Please check this and this link to get location of the hive table.

Shu_ashu · ‎07-31-2019

@Rohini Mathur Using Shell script: one way of doing this would be using Shell script and to get all tables from the database show tables from <db_name>; then store all the tables into a variable and then loop through the each variable and execute show create table <table_name>; command. Using Spark: Another way would be Using spark.catalog.listtables("<db_name>") to list out all the tables from database then filter out only the managed tables and execute show create table on the list of managed tables.. Using Hive metastore db: Hive stores all the table information in metastore like mysql..etc you can also get information about tables from metastore also.

Shu_ashu · ‎07-31-2019

@Erkan ŞİRİN, Try specifying defaultFS,resourcemanager address val spark = SparkSession.builder().master("yarn") .config("spark.hadoop.fs.defaultFS","<name_node_address>") .config("spark.hadoop.yarn.resourcemanager.address","<resourcemanager_address>") .app_name("<job_name>") .enableHiveSupport() .getOrCreate() and then add spark-yarn_x.x.jar to maven repository and try to run again.

Shu_ashu · ‎07-28-2019

@Erkan ŞİRİN Did you try using yarn-client (or) yarn-client instead of yarn in .master. If error still exists then add spark-yarn.jar to the build path, then try to submit the job again. Refer to this link for more details about similar issue.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: JSON Exraction specific key and value

Re: NIFI JSON event extraction only integers from ...

Re: NIFI JSON event extraction only integers from ...

Re: Setting Compression

Re: How to convert "file.creationtime" to timestam...

Re: ExecuteStreamCommand requires file in all mach...

Re: Query to access all managed tables of a given ...

Re: Query to access all managed tables of a given ...

Re: Could not parse Master URL: 'yarn'

Re: Could not parse Master URL: 'yarn'