I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY
snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST( TEXT_FIELD VARCHAR(), NUMERIC_FIELD INT ); INSERT INTO PUBLIC.TEST VALUES ('A', 1), (NULL, 2), ('B', NULL), (NULL, NULL), ('Hello, world', NULL) ; COPY INTO @STAGE/test FROM PUBLIC.TEST FILE_FORMAT = ( COMPRESSION = NONE, TYPE = CSV, FIELD_OPTIONALLY_ENCLOSED_BY = '"' NULL_IF = '' ) OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1 "",2 "B","" "","" "Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"'
will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"'
will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST( TEXT_FIELD VARCHAR(), NUMERIC_FIELD INT ); CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE; SELECT aws_s3.table_import_from_s3( 'PUBLIC.TEST', '', '(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')', 'bucket', 'test_0_0_0.csv', 'aws_region' )
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.
https://stackoverflow.com/questions/67024682/snowflake-null-values-quoted-in-csv-breaks-postgresql-unload April 10, 2021 at 12:15AM
没有评论:
发表评论