The issue happens in a DB with 2.5m of documents, but only 900k of { status: "active" } ones.
For some reason the text search is being very slow for medium/long phrases and I have no idea how to speed it up.
Having the following document:
{ _id: "FOO123", track_date: { $date: "2019-03-09T05:49:22.000Z" }, category_id: "foo", parent_category_id: "foo", title: "Disco Ssd Solido Kingston 480g Macbook Pro Air iMac Martinez", status: "active", site_id: "foo", seller_id: 9999, price: 9999, permalink: "https://example.com", secure_thumbnail: "https://example.com/img.jpg", images: [ "https://example.com/img.jpg", ], fluctuation: 80, last_update: { $date: "2020-12-11T14:00:24.715Z" }, original_price: null, } Performing the following query:
db.articles.aggregate([ { "$match": { "status": "active", "site_id": "foo", "$text": { "$search": "Disco Ssd solido kingston 480g macbook pro air", "$caseSensitive": false } } }, { "$sort": { "score": { "$meta": "textScore" } } }, { "$skip": 0 }, { "$limit": 10 } ]) Takes around 35s to yield results.
These are the indexes that I have set up for the active and title fields
{ v: 2, key: { _fts: "text", _ftsx: 1 }, name: "title_text", language_override: "language", weights: { title: 1 }, default_language: "english", ns: "foo.articles", textIndexVersion: 3, }, { v: 2, key: { status: 1 }, name: "status_1", ns: "foo.articles", } This is the explain()'ed result
{ stages: [ { $cursor: { query: { status: "active", site_id: "foo", $text: { $search: "Disco Ssd solido kingston 480g macbook pro air", $caseSensitive: false, }, }, fields: { $textScore: { $meta: "textScore" } }, queryPlanner: { plannerVersion: 1, namespace: "foo.articles", indexFilterSet: false, parsedQuery: { $and: [ { site_id: { $eq: "foo" } }, { status: { $eq: "active" } }, { $text: { $search: "Disco Ssd solido kingston 480g macbook pro air", $language: "english", $caseSensitive: false, $diacriticSensitive: false, }, }, ], }, queryHash: "36E4BA06", planCacheKey: "4F01F59F", winningPlan: { stage: "PROJECTION_DEFAULT", transformBy: { $textScore: { $meta: "textScore" } }, inputStage: { stage: "FETCH", filter: { $and: [ { site_id: { $eq: "foo" } }, { status: { $eq: "active" } }, ], }, inputStage: { stage: "TEXT", indexPrefix: {}, indexName: "title_text", parsedTextQuery: { terms: [ "480g", "air", "disco", "kingston", "macbook", "pro", "solido", "ssd", ], negatedTerms: [], phrases: [], negatedPhrases: [], }, textIndexVersion: 3, inputStage: { stage: "TEXT_MATCH", inputStage: { stage: "TEXT_OR", inputStages: [ { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, ], }, }, }, }, }, rejectedPlans: [], }, }, }, { $sort: { sortKey: { $computed0: { $meta: "textScore" } }, limit: 10 } }, ], serverInfo: { host: "host", port: "port", version: "4.2.11", gitVersion: "gitVersion", }, ok: 1, $clusterTime: { clusterTime: Timestamp(1, 1609884874), signature: { hash: BinData(0, "ZxY3o2a/Dw/k5HtB2xe6q2OYSwo="), keyId: NumberLong("6911938552179720195"), }, }, operationTime: Timestamp(1, 1609884874), } This is the result of explain("executionStats")
{ stages: [ { $cursor: { query: { status: "active", site_id: "foo", $text: { $search: "Disco Ssd solido kingston 480g macbook pro air", $caseSensitive: false, }, }, fields: { $textScore: { $meta: "textScore" } }, queryPlanner: { plannerVersion: 1, namespace: "foo.articles", indexFilterSet: false, parsedQuery: { $and: [ { site_id: { $eq: "MLA" } }, { status: { $eq: "active" } }, { $text: { $search: "Disco Ssd solido kingston 480g macbook pro air", $language: "english", $caseSensitive: false, $diacriticSensitive: false, }, }, ], }, queryHash: "36E4BA06", planCacheKey: "7B58B045", winningPlan: { stage: "PROJECTION_DEFAULT", transformBy: { $textScore: { $meta: "textScore" } }, inputStage: { stage: "FETCH", filter: { $and: [ { site_id: { $eq: "foo" } }, { status: { $eq: "active" } }, ], }, inputStage: { stage: "TEXT", indexPrefix: {}, indexName: "title_text", parsedTextQuery: { terms: [ "480g", "air", "disco", "kingston", "macbook", "pro", "solido", "ssd", ], negatedTerms: [], phrases: [], negatedPhrases: [], }, textIndexVersion: 3, inputStage: { stage: "TEXT_MATCH", inputStage: { stage: "TEXT_OR", inputStages: [ { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, { stage: "IXSCAN", keyPattern: { _fts: "text", _ftsx: 1 }, indexName: "title_text", isMultiKey: true, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "backward", indexBounds: {}, }, ], }, }, }, }, }, rejectedPlans: [], }, }, }, { $sort: { sortKey: { $computed0: { $meta: "textScore" } }, limit: 10 } }, ], serverInfo: { host: "example.com", port: "port", version: "4.2.11", gitVersion: "ea38428f0c6742c7c2c7f677e73d79e17a2aab96", }, ok: 1, $clusterTime: { clusterTime: Timestamp(46, 1609898585), signature: { hash: BinData(0, "0NNAqlUii1rcfx80y/hTrfTdF3o="), keyId: NumberLong("6911938552179720195"), }, }, operationTime: Timestamp(46, 1609898585), } The issue is not related to the status field, as I've tried removing it and the issue persists. I don't really mind a bit slow query, but more than half a minute is too much for a search.
At first I was doing an AND query for the text search instead of an OR, but it was even slower for long phrases, so decided to give the score a try. I've also tried removing the sort pipeline, but it still takes 30s~
I'll take any tip anyone can give me at this point.
https://stackoverflow.com/questions/65587609/mongodb-m10-atlas-extremely-slow-text-search-query-with-multiple-words-on-a-la January 06, 2021 at 06:42AM
没有评论:
发表评论