2021年3月22日星期一

BigQuery Schema: Repeated Record vs. Record

I have the following data (sample below), represented by 3 schemas, that I want to store into bq (bigquery).

  1. Schema1

        {       "id": 1,       "age": 64,       "tags": [          {"FirstName": "Tom"},          {"LastName": "Hanks"},          {"Country": "USA"}       ]      }  
  2. Schema2

        {       "id": 1,       "age": 64,       "tags": {          "FirstName": "Tom",          "LastName": "Hanks",          "Country": "USA"       }      }  
  3. Schema3

        {       "id": 1,       "age": 64,       "tags": [          {"key": "FirstName", "value": "Tom"},          {"key": "LastName", "value": "Hanks"},          {"key": "Country", "value": "USA"}       ]      }  

My use case is that I want to be able to group/filter records by these tag values.

  1. Schema1 and Schema3 will require UNNEST for any filtering/grouping on a dimension so Schema2 will be the fastest for such queries
  2. Storage space is not a concern though it seems Schema2 might be the most efficient, followed by Schema1 and Schema3 respectively
  3. Only Schema3 supports incremental updates to the tables with new keys.

My questions are:

  1. If we know all the possible keys and define the Schema beforehand, seems Schema2 is the best choice. Would there be a reason to favour Schema1 over Schema2?
  2. For dynamic tags, Schema3 seems the best choice? Are there alternative schemas, or factors I have not already considered above?
https://stackoverflow.com/questions/66757543/bigquery-schema-repeated-record-vs-record March 23, 2021 at 01:04PM

没有评论:

发表评论