Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of Tags for Emails #816

Merged
merged 8 commits into from
Jul 24, 2024
Merged

Conversation

omar-bear
Copy link
Collaborator

Optimization of Tags for Emails

Problem

The loading of emails is slow because it requires fetching tags for each email in a workspace. This problem is particularly pronounced when the number of emails exceeds 15,000.

Proposed Solution

Approach

  1. Create a separate collection for tags: This collection will contain each tag with fields id, label, and companyId.
  2. Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
  3. Optimize queries using joins (lookups): Use joins to efficiently fetch emails corresponding to specified tags.

Collection Structure

Tag Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const TagSchema = new Schema({
  label: { type: String, required: true },
  companyId: { type: ObjectId, ref: 'Company', required: true },
}, { timestamps: true });

const Tag = mongoose.model('Tag', TagSchema);

module.exports = Tag;

Email Collection Schema

const mongoose = require('mongoose');
const { Schema } = mongoose;
const { ObjectId } = Schema.Types;

const MailingSchema = new Schema({
  name: { type: String, required: true },
  _user: { type: ObjectId, ref: 'User', alias: 'userId' },
  author: { type: String, set: normalizeString, alias: 'userName' },
  _workspace: { type: ObjectId, ref: 'Workspace', alias: 'workspace' },
  _parentFolder: { type: ObjectId, ref: 'Folder' },
  _wireframe: { type: ObjectId, ref: 'Template', alias: 'templateId' },
  wireframe: { type: String, set: normalizeString, alias: 'templateName' },
  _company: { type: ObjectId, ref: 'Group', alias: 'group' },
  tags: [{ type: ObjectId, ref: 'Tag' }], // References to tags
  data: {},
  espIds: [String],
}, { timestamps: true });

const Mailing = mongoose.model('Mailing', MailingSchema);

module.exports = Mailing;

Query Optimization Using Aggregation and Joins

  1. Reference tags in emails: Emails will contain references (IDs) to the documents in the tags collection.
  2. Optimize queries using joins (lookups): Use joins to fetch emails corresponding to specified tags efficiently.

Update Tag Operations

  • Add Tags: Update the way tags are added during the creation of emails to reference the tags collection.
  • Delete Tags: Ensure that when a tag is deleted, references in the emails collection are updated accordingly.
  • Update Tags: When updating a tag, make sure all references in the emails collection are consistent.

Migration Script

Since we have more than 30,000 emails and more than 5,000 tags, we need to be careful with the migration script to update all the old data with the new method.

  1. Extract Unique Tags:

    • Use aggregation to identify all unique tags and their associated companyId.
  2. Insert Unique Tags into Tags Collection:

    • Ensure each unique tag is present in the tags collection.
    • Maintain a mapping of tag labels and companyId to tag _id.
  3. Update Mailings with Tag References:

    • Process mailings in manageable batches.
    • Replace tag labels with their corresponding tag _id using the mapping.
  4. Ensure Data Integrity:

    • run the script with mongosh
    • Validate updates after each batch to ensure correctness.

By following these instructions and carefully planning each step, you can ensure a smooth and efficient migration process while maintaining data integrity and minimizing downtime.

Advantages of This Approach

  1. Reduction in Number of Documents Scanned:

    • By using joins to match tags, the number of documents to scan when fetching emails is reduced.
    • This approach can be faster as it limits the filtering operations to a much smaller set of documents.
  2. Optimized Queries:

    • Queries are optimized as they do not need to scan all email documents to apply tag filters.
    • Join operations (lookups) can be minimized, which can improve performance.
  3. Better Scalability:

    • This approach is more scalable as it moves the filtering work to the tags collection, which is smaller and easier to manage.
    • Tag management becomes more centralized and consistent.

Conclusion

By reorganizing the data structure and using aggregation and joins techniques, you can significantly improve the performance of tag filters and avoid the slowdown caused by scanning each email individually.

Copy link

sonarcloud bot commented Jul 24, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
7.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@FlorianGille FlorianGille merged commit 14408d7 into develop Jul 24, 2024
4 of 5 checks passed
@FlorianGille FlorianGille deleted the feat-add-tags-collection branch July 24, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants