What is MongoDB?
MongoDB is non relational data store(does not store data in table) for JSON documents. MongoDB is schemaless. That means altering document is easy at any point in time.MongoDB has a query language, highly-functional secondary indexes (including text search and geospatial), a powerful aggregation framework for data analysis, and more. MongoDB was designed with high availability and scalability in mind, and includes out-of-the-box replication and auto-sharding.
Comparison on MongoDB vs SQL:
MySQL | MongoDB |
---|---|
INSERT INTO users (user_id, age, status) VALUES ("bcd001", 45, "A") |
db.users.insert({ user_id: "bcd001", age: 45, status: "A" }) |
SELECT * FROM users |
db.users.find() |
UPDATE users SET status = "C" WHERE age > 25 |
db.users.update( { age: { $gt: 25 } }, { $set: { status: "C" } }, { multi: true } ) |
MongoDB Schema Design
- Rich Document: Store Array of items, Documents.
- Pre Join/Embed data: MongoDb gives flexibility to embed one document inside another document. This is will give faster access.
- No Mongo Join: MongoDb does not supports join. So you need to keep in mind to embed the documents wherever possible.
- No Constraints : If you are coming from world of relation like foreign key constraint. MDB does not have this concept.
- Atomic operations: In an atomic transaction, a series of database operations either all occur, or nothing occurs. MongoDB write operations are atomic only at the level of a single document. If you are modifying multiple subdocuments inside a document the operation is still atomic. If you are modifying multiple documents the operation is not atomic
- No declared schema : You can change schema anytime without
- Frequency of access : Do not embed heavy document if the frequency of access is really low. It is better to keep it as separate collection.
- there is a 16MB limit for document.
- Atomicity of data: If this really important. It is recommended to embed the document if atomicity for data is important.
Performance:
- Storage engine: are used to cache data from hard disk to memory. It can significantly increase the performance if used correct. The are two type of mongoDB storage engines:
- WiredTiger: compress data before storing to disk.(default comes with mongoDB)
- MMAP
- Indexes: Collection scan or table scan is death of performance. Index is ordered set of things.
- Indexing are not free. Index are represented in binary tree. Indexing slows down your write but read will be much faster. If you need to insert a data. It is recommended that you insert your data first (without creating indexes) and then add your indexes.
- db.students.explain().find();
- db.students.explain().findOne();//is faster
- db.students.createIndex({student_id:1});//will create the index on student. Note that indexing is slow. This may take few minutes.
- db.students.getIndexes(); //Return list of indexes already exist on collection.
- db.students.dropIndex({student_id:1}, {unique: true});//to create unique indexes. Will throw an error if duplicate element exist in collection and unique can’t be created.
- db.students.dropIndex({student_id:1});// drop indexes from collection.
Scaling in Mongodb?
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
There are two methods for addressing system growth: vertical and horizontal scaling.
Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. As a result, there is a practical maximum for vertical scaling.
Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. The trade off is increased complexity in infrastructure and maintenance for the deployment.
MongoDB supports horizontal scaling through sharding.
MongoDb Query:
- Find : to find a record. Ex: db.people.find(); //return all items from the collection.
- _id : Server require unique identifier key aka PRIMARY key. If you don’t pass it it be automatically created and inserted.
- Insert : db.people.insert({“name”: “hitesh”});
- BSON: Fundamental record type for MDB.
- CURD: Create, Read, Update, Delete. In MDB terminology Insert, Find, Update , Remove.
- FindOne: db.people.findOne();//return only one document.
- $gt, $lt, $gte, $lte: db.scores.find ({score: {$gte: 95, $lte: 98}});//filter the document between 95- 98.
- Regex: You can query the document in regex(“like” if you are coming from SQL) style. Ex: db.people.find({name: {$regex: “a”}}); // return all the names which has a in them. Tend not to be optimized.
- $or, $and: is prefix operator. Filter the documents before running query.Ex : db.people.find({$or : [{name: {$regex: “e$”}},{age: {$exists: true}}} ]}); //either matches name OR age criteria.
- $in, $all : Matches any item that matches the criteria. Ex: db.accounts.find({favorites : {$all: [“pretzels”, “beer”]}});//return all items that have beer or pretzels in them as favorite(in any order).
- DOT operator: To reach inside of embedded field. Ex: db.users.find(“email.work”: “hitesh@gmail.com”);//Search for email which as type of work and id as “hitesh@gmail.com”.
- Count: If you want to count the number of items in MDB. You can use: db.scores.count({type: “exam”}); // 1000
- $push,$pop,$pull,$pushAll:
- Upsert : if there is no such item in the document. It will insert it. (this is equivalent to if exist then update or insert)Ex: db.people.update({name: “George”},{$set: {age:40}}, {upsert :true);
Mongoose :
Mongoose is ODM (Object Data Modeling) tool for MongoDB. It takes away all the boilerplate data validation code . It is built on node.js driver.
What is Aggregate ?
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, themap-reduce function, and single purpose aggregation methods.
Aggregation Pipeline
MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.
Other pipeline operations provide tools for grouping and sorting documents by specific field or fields as well as tools for aggregating the contents of arrays, including arrays of documents. In addition, pipeline stages can use operators for tasks such as calculating the average or concatenating a string.
The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.
The aggregation pipeline can operate on a sharded collection.
The aggregation pipeline can use indexes to improve its performance during some of its stages. In addition, the aggregation pipeline has an internal optimization phase. See Pipeline Operators and Indexes andAggregation Pipeline Optimization for details.
