About suri789

suri789 · ‎06-30-2022

so plainfield, s plainfiled both are same

suri789 · ‎06-30-2022

Thanks jagadeesan, But Still your getting the duplicate values

suri789 · ‎06-09-2022

I have a pyspark dataframe with names like N. Plainfield North Plainfield West Home Land NEWYORK newyork So. Plainfield S. Plaindield Some of them contain dots and spaces between initials and some do not. How can they be converted to: n Plainfield north plainfield west homeland newyork newyork so plainfield s plainfield (with no dots and spaces between initials and 1 space between initials and name) I tried using the following but it only replaces dots and doesn't remove spaces between initials: names_modified = names.withColumn("name_clean", regexp_replace("name", r"\.","")) After removing the whitespaces and dots is there any way get the distinct values. like this. north plainfield west homeland newyork so plainfield

suri789 · ‎06-09-2022

0 I am working on the filter the duplicates and creating the autogenerated Id. Here is the code df3=df.distinct df3.createOrReplaceTempView('df3') x = spark.sql('select row_number() over (order by ZipCode, District, Division, Region) As GeographyID, District, Division, Region, RegionName, ZipCode, City, State from df3') x.show(50) after filtering the duplicates I have a problem with city data with the same name, I am not able to get the distinct city values. Here is the example for City Data City N. Plainfield North Plainfield how Can I deal with this kind of string value to get the distinct values?

Online	Offline
Last Visited	‎07-19-2022 12:40 PM

Member Since	‎06-09-2022 12:26 AM
Last Visited	‎07-19-2022 12:40 PM
Posts	4

Cloudera Community

Re: How to remove the space and dots and convert i...

Re: How to remove the space and dots and convert i...

How to remove the space and dots and convert into ...

How can I get the DISTINCT values with same name a...