Reducing the length of a line

i am working on a real estate price prediction project.In my dataset that i downloaded,the locality column consists of the various localities but some are of huge length i.e. extra info given along with the locality name.
one example is -

“Rohini Sector 24 carpet area 650 sqft status Ready to Move floor 4 out of 4 floors transaction New Property furnishing Semi-Furnished facing East overlooking Garden/Park, Main Road car parking 1 Open bathroom 2 balcony 1 ownership Freehold Newly Constructed Property Newly Constructed Property East Facing Property 2BHK Newly build property for Sale. A House is waiting for a Friendly Family to make it a lovely home.So please come and make his house feel alive once again. read more Contact Agent View Phone No. Share Feedback Garima properties Certified Agent Trusted by Users Genuine Listings Market Knowledge”

in-spite of Rohini Sector 24,the above line is provided.
in want to reduce the lengths of such locality names in my dataset.
kindly provide me a solution as how to do it.

the dataset that i use

Hi @ambika.insa1994,
I think you’ll need to set a manual logic here, and pickup the words that are already present in some other record of the same column. This would require some decent amount of effort though.

The other way is to bypass and remove these noisy examples if you have a good amount of data available for training.

If you see a pattern in the locality, that is, the first few words give you what you are looking for, you can split the text like so:

“Rohini Sector 24 carpet area 650 sqft status Ready to Move floor 4 out of 4 floors transaction New Property furnishing Semi-Furnished facing East overlooking Garden/Park, Main Road car parking 1 Open bathroom 2 balcony 1 ownership Freehold Newly Constructed Property Newly Constructed Property East Facing Property 2BHK Newly build property for Sale. A House is waiting for a Friendly Family to make it a lovely home.So please come and make his house feel alive once again. read more Contact Agent View Phone No. Share Feedback Garima properties Certified Agent Trusted by Users Genuine Listings Market Knowledge”. split()[:3]

Using DataFrames
df[‘Locality’].str.split()

If that doesn’t accomplish your goal, then you may have to use regex