I'm very new to python. I'm working in the area of hydrology and I want to learn python to assist me with processing hydrological data.
At the moment I write a script to extract bits of information from a Informatica Big Data set. I have three csv files:
I want to create a file with has all the bores that are in complete_borelist.csv but not in borelist_not_interested.csv. I also want to grab some information from complete_borelist.csv and Elevation_info.csv for those bores which satisfy the first criteria.
My Python script is as follow:
not_interested_list = outfile1 = open('output.csv','w') outfile1.write('Station_ID,Name,Easting,Northing,Location_name,Elevation') outfile1.write('\n')with open ('Borelist_not_interested.csv','r')as f1:for line in f1:ifnot line.startswith('Station'):#ignore header line = line.rstrip() words = line.split(',') station = words not_interested_list.append(station)with open('Complete_borelist.csv','r')as f2: next(f2)#ignore headerfor line in f2: line= line.rstrip() words = line.split(',') station = wordsifnot station in not_interested_list: loc_name = words easting = words northing = words outfile1.write(station+','+easting+','+northing+','+loc_name+',')with open ('Elevation_info.csv','r')as f3: next(f3)#ignore headerfor line in f3: line = line.rstrip() data = line.split(',') bore_id = dataif bore_id == station: elevation = data outfile1.write(elevation) outfile1.write ('\n') outfile1.close()
I have two issues with the script:
The first is the Elevation_info.csv doesn't have information for all the bore in the Complete_borelist.csv. When my loop get to the station where it can't find Elevation record for it, the script doesn't write "null" but continue to write the information for the next station in the same line. Can anyone help me to fix this please?
The second is my complete borelist is about >200000 rows and my script runs through them very slow. Can anyone have any suggestion to make it run faster?