I am working on a .csv that has columns in which numerical data includes letters. I want to strip the letters so that the column can be a float or int.
I have tried the following:
-
using the loop/def process to strip object columns of string data, in "MPG" column and leave only numerical values.
-
it should print the names of the columns where there is at least one entry ending in the characters 'mpg'
CODING IN JUPYTER NOTEBOOK CELLS:
Step 1:
MPG_cols = [] for colname in df.columns[df.dtypes == 'object']: if df[colname].str.endswith('mpg').any(): MPG_cols.append(colname) print(MPG_cols)
- using
.str
so I can use an element-wise string method - only want to consider the string columns
THIS GIVES ME OUTPUT:
[Power]. #good so far
STEP 2:
#define the value to be removed using loop def remove_mpg(pow_val): """For each value, take the number before the 'mpg' unless it is not a string value. This will only happen for NaNs so in that case we just return NaN. """ if isinstance(pow_val, str): i=pow_val.replace('mpg', '') return float(pow_val.split(' ')[0]) else: return np.nan position_cols = ['Vehicle_type'] for colname in MPG_cols: df[colname] = df[colname].apply(remove_mpg) df[Power_cols].head()
The Error I get:
ValueError Traceback (most recent call last) <ipython-input-37-45b7f6d40dea> in <module> 15 16 for colname in MPG_cols: ---> 17 df[colname] = df[colname].apply(remove_mpg) 18 19 df[MPG_cols].head() ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 3846 else: 3847 values = self.astype(object).values -> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype) 3849 3850 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() <ipython-input-37-45b7f6d40dea> in remove_mpg(pow_val) 8 if isinstance(pow_val, str): 9 i=pow_val.replace('mpg', '') ---> 10 return float(pow_val.split(' ')[0]) 11 else: 12 return np.nan ValueError: could not convert string to float: 'null'
I applied similar code to a different column and it worked on that column, but not here.
Any guidance will be greatly appreciated.
Bests,
https://stackoverflow.com/questions/67442417/need-to-strip-csv-column-number-data-of-letters-pandas May 08, 2021 at 06:37AM
没有评论:
发表评论