2021年5月7日星期五

add the day information to timestep in a dataframe

I am trying to read the csv file into a dataframe,the csv fileThe csv file looks like this.

enter image description here

The cell value only contains the hour information and miss the day information. I would like to read this csv file into a dataframe and transform the timing information into the format like 2021-05-07 04:04.00 i.e., I would like to add the day information . How to achieve that.

I used the following code, but it seems that pyspark just add the day information as 1970=01=01, kind of system setting.

spark = SparkSession.builder.getOrCreate()  spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")  df_1 = spark.read.csv('test1.csv', header = True)  df_1 = df_1.withColumn('Timestamp', to_timestamp(col('Timing'), 'HH:mm'))  df_1.show(truncate=False)  

And I got the following result.

+-------+-------------------+  | Timing|          Timestamp|  +-------+-------------------+  |04:04.0|1970-01-01 04:04:00|  |19:04.0|1970-01-01 19:04:00|  
https://stackoverflow.com/questions/67444579/add-the-day-information-to-timestep-in-a-dataframe May 08, 2021 at 02:06PM

没有评论:

发表评论