I have the following data (sample below), represented by 3 schemas, that I want to store into bq (bigquery).
-
Schema1
{ "id": 1, "age": 64, "tags": [ {"FirstName": "Tom"}, {"LastName": "Hanks"}, {"Country": "USA"} ] }
-
Schema2
{ "id": 1, "age": 64, "tags": { "FirstName": "Tom", "LastName": "Hanks", "Country": "USA" } }
-
Schema3
{ "id": 1, "age": 64, "tags": [ {"key": "FirstName", "value": "Tom"}, {"key": "LastName", "value": "Hanks"}, {"key": "Country", "value": "USA"} ] }
My use case is that I want to be able to group/filter records by these tag values.
- Schema1 and Schema3 will require UNNEST for any filtering/grouping on a dimension so Schema2 will be the fastest for such queries
- Storage space is not a concern though it seems Schema2 might be the most efficient, followed by Schema1 and Schema3 respectively
- Only Schema3 supports incremental updates to the tables with new keys.
My questions are:
- If we know all the possible keys and define the Schema beforehand, seems Schema2 is the best choice. Would there be a reason to favour Schema1 over Schema2?
- For dynamic tags, Schema3 seems the best choice? Are there alternative schemas, or factors I have not already considered above?
没有评论:
发表评论