有些事如何做: BigQuery Schema: Repeated Record vs. Record

I have the following data (sample below), represented by 3 schemas, that I want to store into bq (bigquery).

Schema1

    {       "id": 1,       "age": 64,       "tags": [          {"FirstName": "Tom"},          {"LastName": "Hanks"},          {"Country": "USA"}       ]      }

Schema2

    {       "id": 1,       "age": 64,       "tags": {          "FirstName": "Tom",          "LastName": "Hanks",          "Country": "USA"       }      }

Schema3

    {       "id": 1,       "age": 64,       "tags": [          {"key": "FirstName", "value": "Tom"},          {"key": "LastName", "value": "Hanks"},          {"key": "Country", "value": "USA"}       ]      }

My use case is that I want to be able to group/filter records by these tag values.

Schema1 and Schema3 will require UNNEST for any filtering/grouping on a dimension so Schema2 will be the fastest for such queries
Storage space is not a concern though it seems Schema2 might be the most efficient, followed by Schema1 and Schema3 respectively
Only Schema3 supports incremental updates to the tables with new keys.

My questions are:

If we know all the possible keys and define the Schema beforehand, seems Schema2 is the best choice. Would there be a reason to favour Schema1 over Schema2?
For dynamic tags, Schema3 seems the best choice? Are there alternative schemas, or factors I have not already considered above?

https://stackoverflow.com/questions/66757543/bigquery-schema-repeated-record-vs-record March 23, 2021 at 01:04PM

有些事如何做

2021年3月22日星期一

BigQuery Schema: Repeated Record vs. Record

没有评论:

发表评论