I have a csv file and am using the following code to upload it:
val bank = spark.read.format("com.databricks.spark.csv"). | option("header", true). | option("ignoreLeadingWhiteSpace", true). | option("inferSchema", true). | option("quote", ""). | option("delimiter", ";"). | load("bank_dataset.csv") I am getting the following:
| "age | ""job"" | ""marital"" | ""income"" |
|---|---|---|---|
| "58 | ""tech"" | ""married"" | 58000 |
Oddly enough, the first column has only a quote at the beginning and the rest of the columns have double quotes. Except for age, which has the quote in front of it, the other numbers don't have any quotes.
I need to process it so that it looks like this:
| age | job | marital | income |
|---|---|---|---|
| 58 | tech | married | 58000 |
没有评论:
发表评论