2021年5月7日星期五

Need to strip CSV Column Number Data of Letters - Pandas

I am working on a .csv that has columns in which numerical data includes letters. I want to strip the letters so that the column can be a float or int.

I have tried the following:

  • using the loop/def process to strip object columns of string data, in "MPG" column and leave only numerical values.

  • it should print the names of the columns where there is at least one entry ending in the characters 'mpg'

CODING IN JUPYTER NOTEBOOK CELLS:

Step 1:

MPG_cols = []  for colname in df.columns[df.dtypes == 'object']:        if df[colname].str.endswith('mpg').any():           MPG_cols.append(colname)  print(MPG_cols)  
  • using .str so I can use an element-wise string method
  • only want to consider the string columns

THIS GIVES ME OUTPUT:

[Power]. #good so far

STEP 2:

#define the value to be removed using loop    def remove_mpg(pow_val):      """For each value, take the number before the 'mpg'      unless it is not a string value. This will only happen      for NaNs so in that case we just return NaN.      """      if isinstance(pow_val, str):          i=pow_val.replace('mpg', '')           return float(pow_val.split(' ')[0])       else:                      return np.nan        position_cols = ['Vehicle_type']     for colname in MPG_cols:      df[colname] = df[colname].apply(remove_mpg)    df[Power_cols].head()   

The Error I get:


ValueError                                Traceback (most recent call last)  <ipython-input-37-45b7f6d40dea> in <module>       15        16 for colname in MPG_cols:  ---> 17     df[colname] = df[colname].apply(remove_mpg)       18        19 df[MPG_cols].head()    ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in        apply(self, func, convert_dtype, args, **kwds)     3846             else:     3847                 values = self.astype(object).values  -> 3848                 mapped = lib.map_infer(values, f,     convert=convert_dtype)     3849      3850         if len(mapped) and isinstance(mapped[0], Series):    pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()    <ipython-input-37-45b7f6d40dea> in remove_mpg(pow_val)        8     if isinstance(pow_val, str):        9         i=pow_val.replace('mpg', '')  ---> 10         return float(pow_val.split(' ')[0])       11     else:       12                     return np.nan    ValueError: could not convert string to float: 'null'  

I applied similar code to a different column and it worked on that column, but not here.

Any guidance will be greatly appreciated.

Bests,

https://stackoverflow.com/questions/67442417/need-to-strip-csv-column-number-data-of-letters-pandas May 08, 2021 at 06:37AM

没有评论:

发表评论