MongoDB Fundamentals

What is MongoDB?

MongoDB is non relational data store(does not store data in table) for JSON documents. MongoDB is schemaless. That means altering document is easy at any point in time.MongoDB has a query language, highly-functional secondary indexes (including text search and geospatial), a powerful aggregation framework for data analysis, and more. MongoDB was designed with high availability and scalability in mind, and includes out-of-the-box replication and auto-sharding.

Comparison on MongoDB vs SQL:

MySQL MongoDB
INSERT INTO users (user_id, age, status) 
VALUES ("bcd001", 45, "A")
db.users.insert({ 
	user_id: "bcd001", 
	age: 45, 
	status: "A"
})
SELECT * FROM users
db.users.find()
UPDATE users SET status = "C"
WHERE age > 25
db.users.update(
	{ age: { $gt: 25 } },
	{ $set: { status: "C" } },
	{ multi: true }
)

MongoDB Schema Design

  • Rich Document: Store Array of items, Documents.
  • Pre Join/Embed data: MongoDb gives flexibility to embed one document inside another document. This is will give faster access.
  • No Mongo Join: MongoDb does not supports join. So you need to keep in mind to embed the documents wherever possible.
  • No Constraints : If you are coming from world of relation like foreign key constraint. MDB does not have this concept.
  • Atomic operations: In an atomic transaction, a series of database operations either all occur, or nothing occurs. MongoDB write operations are atomic only at the level of a single document. If you are modifying multiple subdocuments inside a document the operation is still atomic. If you are modifying multiple documents the operation is not atomic
  • No declared schema : You can change schema anytime without
  • Frequency of access :  Do not embed heavy document if the frequency of access is really low. It is better to keep it as separate collection.
  • there is  a 16MB limit for document.
  • Atomicity of data: If this really important. It is recommended to embed the document if atomicity for data is important.

 

Performance:

  • Storage engine: are used to cache data from hard disk to memory. It can significantly increase the performance if used correct. The are two type of mongoDB storage engines:
  1. WiredTiger: compress data before storing to disk.(default comes with mongoDB)
  2. MMAP
  • Indexes: Collection scan or table scan is death of performance. Index is ordered set of things.
  • Indexing are not free.  Index are represented in binary tree. Indexing slows down your write  but read will be much faster. If you need to insert a data. It is recommended that you insert your data first (without creating indexes) and then add your indexes.
  • db.students.explain().find();
  • db.students.explain().findOne();//is faster
  • db.students.createIndex({student_id:1});//will create the index on student. Note that indexing is slow. This may take few minutes.
  • db.students.getIndexes(); //Return list of indexes already exist on collection.
  • db.students.dropIndex({student_id:1}, {unique: true});//to create unique indexes. Will throw an error if duplicate element exist in collection and unique can’t be created.
  • db.students.dropIndex({student_id:1});// drop indexes from collection.

 

Scaling in Mongodb?

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.

There are two methods for addressing system growth: vertical and horizontal scaling.

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. As a result, there is a practical maximum for vertical scaling.

Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. The trade off is increased complexity in infrastructure and maintenance for the deployment.

MongoDB supports horizontal scaling through sharding.

Sharded Cluster

A MongoDB sharded cluster consists of the following components:

  • shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.
  • mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
  • config servers: Config servers store metadata and configuration settings for the cluster. As of MongoDB 3.2, config servers can be deployed as a replica set.

The following graphic describes the interaction of components within a sharded cluster:

Diagram of a sample sharded cluster for production purposes.  Contains exactly 3 config servers, 2 or more ``mongos`` query routers, and at least 2 shards. The shards are replica sets.

MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.

 

MongoDb Query:

  •   Find : to find a record.  Ex: db.people.find(); //return all items from the collection.
  • _id : Server require unique identifier key aka PRIMARY key. If you don’t pass it it be automatically created and inserted.
  • Insert :  db.people.insert({“name”: “hitesh”});
  • BSON: Fundamental record type for MDB.
  • CURD: Create, Read, Update, Delete. In MDB terminology Insert, Find, Update , Remove.
  • FindOne: db.people.findOne();//return only one document.
  • $gt, $lt, $gte, $lte: db.scores.find ({score: {$gte: 95, $lte: 98}});//filter the document between 95- 98.
  •  Regex: You can query the document in regex(“like” if you are coming from SQL) style. Ex: db.people.find({name: {$regex: “a”}}); // return all the names which has a in them.  Tend not to be optimized.
  • $or, $and:  is prefix operator. Filter the documents before running query.Ex : db.people.find({$or : [{name: {$regex: “e$”}},{age: {$exists: true}}} ]}); //either matches name OR age criteria.
  • $in, $all : Matches any item that matches the criteria. Ex: db.accounts.find({favorites : {$all: [“pretzels”, “beer”]}});//return all items that have beer or pretzels in them as favorite(in any order).
  • DOT operator: To reach inside of embedded field. Ex: db.users.find(“email.work”: “hitesh@gmail.com”);//Search for email which as type of work and id as “hitesh@gmail.com”.
  • Count:  If you want to count the number of items in MDB. You can use: db.scores.count({type: “exam”}); // 1000
  • $push,$pop,$pull,$pushAll:Screen Shot 2016-02-21 at 6.26.09 PM.png
  • Upsert : if there is no such item in the document. It will insert it. (this is equivalent to if exist then update  or insert)Ex: db.people.update({name: “George”},{$set: {age:40}}, {upsert :true);

 

Mongoose :

Mongoose is ODM (Object Data Modeling) tool for MongoDB. It takes away all the boilerplate data validation code . It is built on node.js driver.

Source code

Screen Shot 2016-03-04 at 4.40.20 PM

 

What is Aggregate ?

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, themap-reduce function, and single purpose aggregation methods.

 

Aggregation Pipeline

MongoDB’s aggregation framework is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.

The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.

Other pipeline operations provide tools for grouping and sorting documents by specific field or fields as well as tools for aggregating the contents of arrays, including arrays of documents. In addition, pipeline stages can use operators for tasks such as calculating the average or concatenating a string.

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.

The aggregation pipeline can operate on a sharded collection.

The aggregation pipeline can use indexes to improve its performance during some of its stages. In addition, the aggregation pipeline has an internal optimization phase. See Pipeline Operators and Indexes andAggregation Pipeline Optimization for details.

Diagram of the annotated aggregation pipeline operation. The aggregation pipeline has two stages: ``$match`` and ``$group``.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s