NoSQL Databases: Cassandra and MongoDB
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,with asynchronous masterless replication allowing low latency operations for all clients. MongoDB is a free and open-source cross-platform document-oriented database. Classified as a NoSQL database, MongoDB avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.
Summary
Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,with asynchronous masterless replication allowing low latency operations for all clients. MongoDB is a free and open-source cross-platform document-oriented database. Classified as a NoSQL database, MongoDB avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.
Things to Remember
Cassandra:
Facebook's Cassandra combines Google's BigTable and Amazon's Dynamo.
- distributed
- high performance
- flexible partitioning
- replica placement
- linearly scalable and
- not a single point of failure
MongoDB features:
- Ad hoc queries
- Indexing
- Replication
- Load balancing
- File storage
- Aggregation
- Capped collections
MCQs
No MCQs found.
Subjective Questions
No subjective questions found.
Videos
No videos found.

NoSQL Databases: Cassandra and MongoDB
Cassandra
Cassandra is a free post-relational database solution with characteristics:
- distributed
- high performance (flexible partitioning ,replica placement)
- extremely/ linearly scalable and
- fault tolerant (no single point of failure)
- flexible schema design
- data compression
- cql (like SQL)
- data consistency
- no need of separate caching layer and special h/w or s/w
Cassandra can serve as both real-time data store for online/transactional apps or intensive database for business intelligence systems. It is used as backend storage system for multiple services in facebook.
Cassandra is a love child of Google's BigTable and Amazon's Dynamo. Netflix, twitter, cisco etc are among the Cassandra users.
Data Model
- Cassandra is a distributed key-value store.
- A table in Cassandra is a distributed multi-dimensional map indexed by a key. The value is an object which is highly structured.
- The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long.
- Every operation under a single row key is atomic per replica no matter how many columns are being read or written into.
- The columns are grouped together into sets called column families.
- Cassandra exposes its two kinds of columns families, Simple and Super column families. [2]
- Super column families(meta) can be visualized as a column family within a column family.
The Cassandra API consists of the following three simple methods.
- get(table; key; colName)
- insert(table; key; rowMutation)
- delete(table; key; colName)
colName can refer to a specific column within a column family, a column family, a super column family, or a column within a super column.[1]
Architecture
- System is fault tolerant, as it was designed considering system/hardware failure can and do occur
- Is p2p, distributed system.
- All nodes the same and data partitioned with all nodes in a cluster.[3]
- Custom data replication to ensure fault tolerance.
- Read/Write -anywhere design
- Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second
- A commit log is used on each node to capture write activity. Data durability is assured
- Data also are written to an in-memory structure (memorable) and then to disk once the memory structure is full (an SStable)
- Schema used in Cassandra is mirrored after Google BigTable. It's a row-oriented, column structure
- A keyspace is akin to a database in the RDBMS world
- A column family is similar to an RDBMS table but is more flexible/dynamic
- A row in a column family is indexed by its key. Other columns may be indexed as well Portfolio Keyspace Customer Column Family ID Name SSN DOB
MongoDB
- 10gen began dev in 2007 and initial release in 2009
- Open-source
- cross-platform
- written in c++
- document oriented DB
- high performance and availability
- work on the concept of collection and document
- Master/slave replication (auto-failover with replica sets)
- Sharding built-in
- Queries are javascript expressions
- Run arbitrary javascript functions server-side
- Uses memory mapped files for data storage
- An empty database takes up 192Mb
- GridFS to store big data + metadata (not actually an FS)
Features
- Ad hoc queries: supports field, regular expression or queries that return a random sample result of a given size.
- Indexing
- Replication
- Load balancing: data in a collection is distributed (split) based on shard key (shard is master with 1 or many slaves)
- File storage : can be used as Grid File System.
- Aggregation : can be used mapReduce for batch processing or aggregation operations
- Server-side JavaScript execution
- Capped collections: fixed size collection to maintain the insertion order [4]
Data model
- Data model: Using BSON (binary JSON), developers can easily map to modern object-oriented languages without a complicated ORM layer.
- BSON is a binary format in which zero or more key/value pairs are stored as a single entity.
- is lightweight, traversable, efficient.

Advantages of MongoDB over RDBMS
- Structure of a single object is clear
- No complex joins
- Deep query-ability : supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL
- Schema-less : document database in which a collection may hold different documents. A number of fields, content, and size of the document can differ from 1 document to another.
- Tuning
- Ease of scale-out : easy to scale
- Conversion / mapping of application objects to database objects not needed
- Uses internal memory for storing the (windowed) working set, enabling faster access of data
Some basic commands in MongoDb
1.To use MongoDB
->mongo
2.MongoDB Help
->db.help()
3.MOngoDB Statistics
->db.stats()
4.TO create Database
->use Database_name
switched to Database_name
5.Check Database lists
->show dbs
6.Drop database
->db.dropDatabase()
7.Create Collection
->db.createCollection(name, options)
where, name= name of collection
options= options about memory size and indexing
//in MongoDB no need to specify the options
8.Drop Collection
->db.COLLECTION_NAME.drop()
9.Insert Document
->db.COLLECTION_NAME.insert(document)
Example:
db.user.insert({user_name:"Raju",Age:"21",Address:"Pokhara"})
//WriteResult({ "nInserted" : 1 })
10.To query data from MongoDB
->db.user.find().pretty() //user = collection
{
"_id" : ObjectId("55e43a8cf585cca56cafc325"),
"user_name" : "Raju",
"Age" : "21",
"Address" : "Pokhara"
}
11.Update Data in MongoDB
->db.user.update({'user_name':"Raju"},{$set:{"user_name":"Manoj"}},{multi:true})
->db.user.find().pretty()
{
"_id" : ObjectId("55e43a8cf585cca56cafc325"),
"user_name" : "Manoj",
"Age" : "21",
"Address" : "Syangja"
}
12. Remove document in MOngoDB
->db.user.remove({'user_name':"Manoj"}) //Delete only one column
->db.user.remove() //same as truncate in RDBMS
References:
Lesson
NoSQL
Subject
Computer Engineering
Grade
Engineering
Recent Notes
No recent notes.
Related Notes
No related notes.