I want to perform a join based on Lavenshtein distance. I have 2 tables: Data : Which is a CSV in HDFS files repository. one of DAta columns is disease description 15K rows. df7_ct_map: a table I call from Hive. one if the columns is disease Indication 20K rows. I m trying to join both tables by matching each description with the indication ( they are text descriptions of siknesses). Ideally they need to be the same but if both texts are different I wish to select match text containing the maximum of common words.