Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Regex doesn't work on ExtractText Processor?

Rising Star

Hello,

I am trying to extract value from a key/value pair json record. I am trying to extract the value of "image" key.  I used an ExtractText processor but there was no match. Although, there was a match when i used an online regex extractor(https://onlinetexttools.com/extract-regex-matches-from-text).

 

Regex i used in the extracttext processor: (?<=\"image\"\s:\s")[A-Z-a-z-0-9\-\:\/\.\_]+

 

My json record:

{
  "id" : "03ee73b8-a553-4575-8dfa-d0da4e7939e9",
  "url" : "https://ll.thespacedevs.com/2.0.0/launch/03ee73b8-a553-4575-8dfa-d0da4e7939e9/",
  "launch_library_id" : null,
  "slug" : "falcon-9-block-5-galaxy-33-34",
  "name" : "Falcon 9 Block 5 | Galaxy 33 & 34",
  "status" : {
    "id" : 2,
    "name" : "TBD"
  },
  "net" : "2022-10-05T23:07:00Z",
  "window_end" : "2022-10-06T00:14:00Z",
  "window_start" : "2022-10-05T23:07:00Z",
  "inhold" : false,
  "tbdtime" : false,
  "tbddate" : false,
  "probability" : null,
  "holdreason" : "",
  "failreason" : "",
  "hashtag" : null,
  "launch_service_provider" : {
    "id" : 121,
    "url" : "https://ll.thespacedevs.com/2.0.0/agencies/121/",
    "name" : "SpaceX",
    "type" : "Commercial"
  },
  "rocket" : {
    "id" : 7549,
    "configuration" : {
      "id" : 164,
      "launch_library_id" : 188,
      "url" : "https://ll.thespacedevs.com/2.0.0/config/launcher/164/",
      "name" : "Falcon 9",
      "family" : "Falcon",
      "full_name" : "Falcon 9 Block 5",
      "variant" : "Block 5"
    }
  },
  "mission" : {
    "id" : 5976,
    "launch_library_id" : null,
    "name" : "Galaxy 33 & 34",
    "description" : "Galaxy 33, 34 are two geostationary communications satellites manufactured by Northrop Grumman and operated by Intelsat.",
    "launch_designator" : null,
    "type" : "Communications",
    "orbit" : {
      "id" : 2,
      "name" : "Geostationary Transfer Orbit",
      "abbrev" : "GTO"
    }
  },
  "pad" : {
    "id" : 80,
    "url" : "https://ll.thespacedevs.com/2.0.0/pad/80/",
    "agency_id" : 121,
    "name" : "Space Launch Complex 40",
    "info_url" : null,
    "wiki_url" : "https://en.wikipedia.org/wiki/Cape_Canaveral_Air_Force_Station_Space_Launch_Complex_40",
    "map_url" : "http://maps.google.com/maps?q=28.56194122,-80.57735736",
    "latitude" : "28.56194122",
    "longitude" : "-80.57735736",
    "location" : {
      "id" : 12,
      "url" : "https://ll.thespacedevs.com/2.0.0/location/12/",
      "name" : "Cape Canaveral, FL, USA",
      "country_code" : "USA",
      "map_image" : "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launch_images/location_12_20200803142519.jpg",
      "total_launch_count" : 858,
      "total_landing_count" : 24
    },
    "map_image" : "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launch_images/pad_80_20200803143323.jpg",
    "total_launch_count" : 154
  },
  "webcast_live" : false,
  "image" : "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launcher_images/falcon_9_block__image_20210506060831.jpg",
  "infographic" : null,
  "program" : [ ]
}

 

Expected output:

 https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launcher_images/falcon_9_block__i...

 

 

Thanks for your help.

2 ACCEPTED SOLUTIONS

Super Collaborator

Hi,

 

You dont have to use the ExtractText processor for this. Use the EvaluateJsonPath processor with the following configuration:

 

SAMSAL_0-1664202157672.png

 

If you find this helpful please accept solution.

Thanks

View solution in original post

Explorer

Hi @rafy

I tried the same regex with the same sample in 1.13.2 and 1.16.3 and both resulted in image url string.

There can be a case with nifi JSON beautificator, the initial JSON lacks spaces and line breaks.

Expression that works with https://ll.thespacedevs.com/2.0.0/launch/ is:
(?<=\"image\":\")[A-Z-a-z-0-9\-\:\/\.\_]+

and as @SAMSAL said, EvaluateJsonPath is the right tool for this job.

View solution in original post

3 REPLIES 3

Super Collaborator

Hi,

 

You dont have to use the ExtractText processor for this. Use the EvaluateJsonPath processor with the following configuration:

 

SAMSAL_0-1664202157672.png

 

If you find this helpful please accept solution.

Thanks

Explorer

Hi @rafy

I tried the same regex with the same sample in 1.13.2 and 1.16.3 and both resulted in image url string.

There can be a case with nifi JSON beautificator, the initial JSON lacks spaces and line breaks.

Expression that works with https://ll.thespacedevs.com/2.0.0/launch/ is:
(?<=\"image\":\")[A-Z-a-z-0-9\-\:\/\.\_]+

and as @SAMSAL said, EvaluateJsonPath is the right tool for this job.

Rising Star

Thank you all.

I eventually evaluated the json path to extract the url. My mind was astray as i was using complex solution to a simple problem. 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.