Designing one to many relations

We already discussed one to one relations in MongoDB, and the main conclusion was that you should design your collections according to the most frequent access pattern. With one to many relations, this is still valid, but other factors may come into play.

Let’s look at a simple problem: we are a shop and we want to store customers’ information as well as their orders. Each customer can make several orders, this is a one to many relation. With MySQL or any relational database system, we would create 2 tables:

CREATE TABLE customer (
  customer_id int(11) NOT NULL AUTO_INCREMENT,
  name varchar(50) NOT NULL DEFAULT '',
  zipcode varchar(10) DEFAULT NULL,
  PRIMARY KEY (customer_id)
) ENGINE=InnoDB;
CREATE TABLE orders (
  order_id int(11) NOT NULL AUTO_INCREMENT,
  customer_id int(11) NOT NULL DEFAULT '0',
  price decimal(10,2) NOT NULL DEFAULT '0.00',
  status tinyint(4) NOT NULL,
  PRIMARY KEY (order_id)
) ENGINE=InnoDB;

(Like in the previous post, I’m omitting foreign keys for clarity)

In MongoDB, we can use the same design but of course as we cannot do joins, it would not always work well. For instance, if we want to know the name of the customer who bought the order with _id = 100, we would need 2 queries:

> db.orders.find({_id:100},{customer_id:1,_id:0}) # Would return { "customer_id" : 123 }

and then

> db.customer.find({_id:123},{name:1,_id:0}) # Would return { "name" : "Stephane" }

While with MySQL, this is easily done in a single query:

mysql> SELECT name FROM customer INNER JOIN orders USING(customer_id) WHERE order_id = 100;

A good way to solve this problem with MongoDB would be to embed orders into customers, such as:

> db.customers.findOne()
{
  "_id" : 123,
  "name" : "Stephane",
  "zipcode" : "75000",
  "orders" : [
    {
      "_id" : 100,
      "price" : 100,
      "status" : 2
    },
    {
      "_id" : 234,
      "price" : 55,
      "status" : 1
    },
    {
      "_id" : 499,
      "price" : 899,
      "status" : 1
    }
  ]
}

And the query giving the name of the customer who bought the order with _id = 100 would be:

> db.customers.find({"orders._id":100},{name:1,_id:0}}

So far, so good. But here are a few questions about this design.

1. Would it still work if we needed to run queries on orders, for instance if we wanted to know the number of orders with status = 2?
Yes, this can be done with the aggregation framework with a query such as:

> db.customers.aggregate([
      {$project:{"orders.status":1}},
      {$unwind:"$orders"},
      {$match:{"orders.status":2}},
      {$group:{_id:null,total:{$sum:1}}}
])

Of course the query would have been much easier to write and would be more efficient if we had embedded customers into orders (in an order2 collection for instance):

> db.order2.find({status:2}).count()

So as always you will have to make decisions to find the design that best fits with your most frequent access pattern. And you will have to accept that the others access patterns may be slow. This is very different from a normalized schema that will be equally good for nearly every access pattern.

Also note that embedding orders into customers does not duplicate data because each order is unique. But embedding customers into orders would create a lot of data duplication because if a customer has 100 orders, the customer’s detail would be repeated 100 times. This can create inconsistencies that the application code will have to handle correctly.

2. Does embedding scale? By that I mean what happens if a customer has hundreds of thousands of orders?
This is in my opinion the main limitation of this design. First a document in MongoDB is limited to 16MB, so embedding a lot of objects into a document may not even be possible. With customers and orders you are likely not to meet this problem, but if you want to build a directory of people per city, it would be a bad design to create a document per city and embed all the people’s information.

And then anyway even if you do not reach the physical limits of MongoDB, having very large documents is bad for performance. All operations on very big documents will take a long time, so you cannot expect good performance in this case. Your only choice is then to normalize your data, which will make your queries harder to write and less efficient.

Conclusion

In this article, we have seen several topics that you will have to keep in mind when designing one to many relations in MongoDB:

Denormalizing by embedding objects (like embedding orders into a customer) is a common desing pattern to deal with the lack of JOINs in MongoDB, and it applies well to this kind of relation.
Depending on the way you use embedding, it may create data duplication. It is of course better if you can avoid it.
Embedding works well when the one to many relation is actually a one to few relation. If the many is large, you may have to use a normalized schema for which the main drawback is that some queries will be difficult to write and/or very slow.

Therefore do not believe that because MongoDB is schemaless, you will not have to take care of your schema design!

Do you want to learn more on MongoDB? Come to my tutorial at PLUK in November!

The post Designing one to many relations – MongoDB vs MySQL appeared first on MySQL Performance Blog.

Designing one to many relations – MongoDB vs MySQL

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112