I am trying to read the csv file into a dataframe,the csv fileThe csv file looks like this.
The cell value only contains the hour information and miss the day information. I would like to read this csv file into a dataframe and transform the timing information into the format like 2021-05-07 04:04.00
i.e., I would like to add the day information . How to achieve that.
I used the following code, but it seems that pyspark just add the day information as 1970=01=01
, kind of system setting.
spark = SparkSession.builder.getOrCreate() spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY") df_1 = spark.read.csv('test1.csv', header = True) df_1 = df_1.withColumn('Timestamp', to_timestamp(col('Timing'), 'HH:mm')) df_1.show(truncate=False)
And I got the following result.
+-------+-------------------+ | Timing| Timestamp| +-------+-------------------+ |04:04.0|1970-01-01 04:04:00| |19:04.0|1970-01-01 19:04:00|
https://stackoverflow.com/questions/67444579/add-the-day-information-to-timestep-in-a-dataframe May 08, 2021 at 02:06PM
没有评论:
发表评论