2021年4月24日星期六

How to open a file whose name is stored in a pandas cell, manipulate the contents and store in a new column

Dataframe Example

index fileName startline endline
0 293104.java 30 40
1 288951.java 183 247
2 2378709.java 98 117

Goal

I want to open and read the contents of the file in fileName, and extract the lines in the range created by the values in the startline and endline columns.

I then want to store that in a new column called snippet.

Example of snippet creation logic

def snippetMaker(fileName, startLine, endLine):       file = open(fileName,'r').read()       snippet = file.split('\n')[startLine:endLine]       cleanSnippet = str(snippet).replace('[','').replace(']','').replace(',',' ')       return cleanSnippet   

Current approach

I have seen that map() is often used in functions like that shown above (given that the function can accept iterable arguments and returns a list) then set equal to a dataframe column like below.

df['snippet']= snippetMaker(df['fileName'],df['startLine'],df['endLine'])  

I am having trouble reconfiguring the above snippetMaker function to work in such a way.

Other details

I do not want to use Iterrows, the dataframe contains over 8m rows.

https://stackoverflow.com/questions/67248859/how-to-open-a-file-whose-name-is-stored-in-a-pandas-cell-manipulate-the-content April 25, 2021 at 09:06AM

没有评论:

发表评论