Support Questions
Find answers, ask questions, and share your expertise

After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

New Contributor
 
5 REPLIES 5
Highlighted

Re: After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

Contributor

Run a "Replace text processor" with a regex configured to capture the character you want to kill. If you have already extracted it from the JSON you can use something like ([^a-zA-Z-]).

Generally though replace text evaluate json path will be good processors to start with. A flow might look like

  1. Evaluate Json path to extract the text to an attribute
  2. Replace the flow file content with the attribute formatted as you want(this does erase the original json from twitter in this strategy)
  3. Clean up the flow file with a regex like ([^a-zA-Z-]) and replace it with nothing or a space.
  4. Submit to nlp or wherever its going.
Highlighted

Re: After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

New Contributor

How would you remove hyperlinks?

@Chris Gambino

Highlighted

Re: After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

Contributor

@Monil Patel

Is your idea to remove hyperlinks in their entirety? A java/javascript style regex to detect any URL would be a good start. This guy covers a few strategies around it.

http://www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/

Highlighted

Re: After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

New Contributor

Sorry I meant removing URLs in there entirety so that it won't affect running NLP on the text itself.

Highlighted

Re: After getting tweet and accessing tweet text via (getTwitter and EvaluateJsonPath in Nifi), how do I remove special characters(/n,/t, $, and #) from the text to run NLP on the tweet itself?

Contributor

Use the replace text processor to use a regex search term to replace all the URLs.