2021年1月5日星期二

MongoDB (M10 Atlas) extremely slow text search query with multiple words on a large dataset

The issue happens in a DB with 2.5m of documents, but only 900k of { status: "active" } ones.

For some reason the text search is being very slow for medium/long phrases and I have no idea how to speed it up.

Having the following document:

{    _id: "FOO123",    track_date: { $date: "2019-03-09T05:49:22.000Z" },    category_id: "foo",    parent_category_id: "foo",    title: "Disco Ssd Solido Kingston 480g Macbook Pro Air iMac Martinez",    status: "active",    site_id: "foo",    seller_id: 9999,    price: 9999,    permalink:      "https://example.com",    secure_thumbnail:      "https://example.com/img.jpg",    images: [      "https://example.com/img.jpg",    ],    fluctuation: 80,    last_update: { $date: "2020-12-11T14:00:24.715Z" },    original_price: null,  }  

Performing the following query:

db.articles.aggregate([    {      "$match": {        "status": "active",        "site_id": "foo",        "$text": {          "$search": "Disco Ssd solido kingston 480g macbook pro air",          "$caseSensitive": false        }      }    },    {      "$sort": {        "score": {          "$meta": "textScore"        }      }    },    {      "$skip": 0    },    {      "$limit": 10    }  ])  

Takes around 35s to yield results.


These are the indexes that I have set up for the active and title fields

{    v: 2,    key: { _fts: "text", _ftsx: 1 },    name: "title_text",    language_override: "language",    weights: { title: 1 },    default_language: "english",    ns: "foo.articles",    textIndexVersion: 3,  },  {    v: 2,    key: { status: 1 },    name: "status_1",    ns: "foo.articles",  }  

This is the explain()'ed result

{    stages: [      {        $cursor: {          query: {            status: "active",            site_id: "foo",            $text: {              $search: "Disco Ssd solido kingston 480g macbook pro air",              $caseSensitive: false,            },          },          fields: { $textScore: { $meta: "textScore" } },          queryPlanner: {            plannerVersion: 1,            namespace: "foo.articles",            indexFilterSet: false,            parsedQuery: {              $and: [                { site_id: { $eq: "foo" } },                { status: { $eq: "active" } },                {                  $text: {                    $search: "Disco Ssd solido kingston 480g macbook pro air",                    $language: "english",                    $caseSensitive: false,                    $diacriticSensitive: false,                  },                },              ],            },            queryHash: "36E4BA06",            planCacheKey: "4F01F59F",            winningPlan: {              stage: "PROJECTION_DEFAULT",              transformBy: { $textScore: { $meta: "textScore" } },              inputStage: {                stage: "FETCH",                filter: {                  $and: [                    { site_id: { $eq: "foo" } },                    { status: { $eq: "active" } },                  ],                },                inputStage: {                  stage: "TEXT",                  indexPrefix: {},                  indexName: "title_text",                  parsedTextQuery: {                    terms: [                      "480g",                      "air",                      "disco",                      "kingston",                      "macbook",                      "pro",                      "solido",                      "ssd",                    ],                    negatedTerms: [],                    phrases: [],                    negatedPhrases: [],                  },                  textIndexVersion: 3,                  inputStage: {                    stage: "TEXT_MATCH",                    inputStage: {                      stage: "TEXT_OR",                      inputStages: [                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                      ],                    },                  },                },              },            },            rejectedPlans: [],          },        },      },      { $sort: { sortKey: { $computed0: { $meta: "textScore" } }, limit: 10 } },    ],    serverInfo: {      host: "host",      port: "port",      version: "4.2.11",      gitVersion: "gitVersion",    },    ok: 1,    $clusterTime: {      clusterTime: Timestamp(1, 1609884874),      signature: {        hash: BinData(0, "ZxY3o2a/Dw/k5HtB2xe6q2OYSwo="),        keyId: NumberLong("6911938552179720195"),      },    },    operationTime: Timestamp(1, 1609884874),  }  

This is the result of explain("executionStats")

{    stages: [      {        $cursor: {          query: {            status: "active",            site_id: "foo",            $text: {              $search: "Disco Ssd solido kingston 480g macbook pro air",              $caseSensitive: false,            },          },          fields: { $textScore: { $meta: "textScore" } },          queryPlanner: {            plannerVersion: 1,            namespace: "foo.articles",            indexFilterSet: false,            parsedQuery: {              $and: [                { site_id: { $eq: "MLA" } },                { status: { $eq: "active" } },                {                  $text: {                    $search: "Disco Ssd solido kingston 480g macbook pro air",                    $language: "english",                    $caseSensitive: false,                    $diacriticSensitive: false,                  },                },              ],            },            queryHash: "36E4BA06",            planCacheKey: "7B58B045",            winningPlan: {              stage: "PROJECTION_DEFAULT",              transformBy: { $textScore: { $meta: "textScore" } },              inputStage: {                stage: "FETCH",                filter: {                  $and: [                    { site_id: { $eq: "foo" } },                    { status: { $eq: "active" } },                  ],                },                inputStage: {                  stage: "TEXT",                  indexPrefix: {},                  indexName: "title_text",                  parsedTextQuery: {                    terms: [                      "480g",                      "air",                      "disco",                      "kingston",                      "macbook",                      "pro",                      "solido",                      "ssd",                    ],                    negatedTerms: [],                    phrases: [],                    negatedPhrases: [],                  },                  textIndexVersion: 3,                  inputStage: {                    stage: "TEXT_MATCH",                    inputStage: {                      stage: "TEXT_OR",                      inputStages: [                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                        {                          stage: "IXSCAN",                          keyPattern: { _fts: "text", _ftsx: 1 },                          indexName: "title_text",                          isMultiKey: true,                          isUnique: false,                          isSparse: false,                          isPartial: false,                          indexVersion: 2,                          direction: "backward",                          indexBounds: {},                        },                      ],                    },                  },                },              },            },            rejectedPlans: [],          },        },      },      { $sort: { sortKey: { $computed0: { $meta: "textScore" } }, limit: 10 } },    ],    serverInfo: {      host: "example.com",      port: "port",      version: "4.2.11",      gitVersion: "ea38428f0c6742c7c2c7f677e73d79e17a2aab96",    },    ok: 1,    $clusterTime: {      clusterTime: Timestamp(46, 1609898585),      signature: {        hash: BinData(0, "0NNAqlUii1rcfx80y/hTrfTdF3o="),        keyId: NumberLong("6911938552179720195"),      },    },    operationTime: Timestamp(46, 1609898585),  }  

The issue is not related to the status field, as I've tried removing it and the issue persists. I don't really mind a bit slow query, but more than half a minute is too much for a search.

At first I was doing an AND query for the text search instead of an OR, but it was even slower for long phrases, so decided to give the score a try. I've also tried removing the sort pipeline, but it still takes 30s~

I'll take any tip anyone can give me at this point.

https://stackoverflow.com/questions/65587609/mongodb-m10-atlas-extremely-slow-text-search-query-with-multiple-words-on-a-la January 06, 2021 at 06:42AM

没有评论:

发表评论