2021年4月9日星期五

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :

CREATE OR REPLACE TABLE PUBLIC.TEST(      TEXT_FIELD VARCHAR(),      NUMERIC_FIELD INT  );  INSERT INTO PUBLIC.TEST VALUES      ('A', 1),      (NULL, 2),      ('B', NULL),      (NULL, NULL),      ('Hello, world', NULL)  ;  COPY INTO @STAGE/test  FROM PUBLIC.TEST  FILE_FORMAT = (      COMPRESSION = NONE,      TYPE = CSV,      FIELD_OPTIONALLY_ENCLOSED_BY = '"'      NULL_IF = ''  )  OVERWRITE = TRUE;  

Snowflake will from that create the following CSV

"A",1  "",2  "B",""  "",""  "Hello, world",""  

But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.  

Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'

Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :

DataError: invalid input syntax for integer: "" CONTEXT:  COPY test, line 3, column numeric_field: ""  

FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :

NotSupportedError: CSV quote character must not appear in the NULL specification  

FYI, To test the unloading in s3 I will use this command in PostgreSQL:

CREATE IF NOT EXISTS TABLE PUBLIC.TEST(    TEXT_FIELD VARCHAR(),    NUMERIC_FIELD INT  );  CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;  SELECT aws_s3.table_import_from_s3(  'PUBLIC.TEST',  '',  '(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',  'bucket',  'test_0_0_0.csv',  'aws_region'  )  

Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.

https://stackoverflow.com/questions/67024682/snowflake-null-values-quoted-in-csv-breaks-postgresql-unload April 10, 2021 at 12:15AM

没有评论:

发表评论