diff --git a/README.md b/README.md
index b225b9b2ee8f..1984cd956a0a 100644
--- a/README.md
+++ b/README.md
@@ -9,13 +9,15 @@ MXNet is a deep learning framework designed for both *efficiency* and *flexibili
 It allows you to mix the [flavours](http://mxnet.readthedocs.org/en/latest/program_model.html) of
 deep learning programs together to maximize the efficiency and your productivity.
 
+
 What's New
 ----------
 * [Note on Programming Models for Deep Learning](http://mxnet.readthedocs.org/en/latest/program_model.html)
 
 Contents
 --------
-* [Documentation](http://mxnet.readthedocs.org/en/latest/)
+* [Documentation and Tutorials](http://mxnet.readthedocs.org/en/latest/)
+* [Open Source Design Notes](http://mxnet.readthedocs.org/en/latest/#open-source-design-notes)
 * [Code Examples](example)
 * [Build Instruction](doc/build.md)
 * [Features](#features)
@@ -25,8 +27,8 @@ Features
 --------
 * To Mix and Maximize
   - Mix all flavours of programming models to maximize flexibility and efficiency.
-* Lightweight and scalable
-  - Minimum build dependency, scales to multi-GPU and ready toward distributed.
+* Lightweight, scalable and memory efficient.
+  - Minimum build dependency, scales to multi-GPUs with very low memory usage.
 * Auto parallelization
   - Write numpy-style ndarray GPU programs, which will be automatically parallelized.
 * Language agnostic
diff --git a/doc/developer-guide/index.md b/doc/developer-guide/index.md
index c85656d1ad45..7e23eac92459 100644
--- a/doc/developer-guide/index.md
+++ b/doc/developer-guide/index.md
@@ -1,10 +1,60 @@
 MXNet Developer Guide
 =====================
-This page contains links to all the developer related documents on mxnet.
+This page contains resources you need to understand how mxnet works and how to work on mxnet codebase.
+We believe that it is important to make the system modularized and understandable by general audience.
+If you are interested in general design, checkout our effort of [open source design notes](#open-source-design-notes)
+for deep learning.
 
 Overview of the Design
 ----------------------
-* [Execution Engine](engine.md)
+![System Overview](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/system/overview.png)
+
+The above shows major modules of mxnet, and how do they interact with each other. The modules are
+- Runtime Dependency Engine: Schedules and executes the operations according to their read/write dependency.
+- Storage Allocator: Efficiently allocate and recycles memory blocks for GPU and CPU.
+- Resource Manager: Manage global resources such as random number generator, temporal space.
+- NDArray: Dynamic asynchronize n-dimensional arrays, provide flexible imperative programs for MXNet.
+- Symbolic Execution: Static symbolic graph executor, provide efficient symbolic graph execution and optimization.
+- Operator: Operators that defines static forward and gradient calculation(backprop).
+- Symbol Construction: Symbolic construction, provide a way to construct computation graph(net configuration)
+- KVStore: Key-value store interface for easy parameter synchronizations.
+- Data Loading(IO): Efficient distributed data loading and augmentation.
+
+How to Read the Code
+--------------------
+- All the module interface are listed in [include](../../include), these interfaces are heavily documented.
+- You read the [Doxygen Version](https://mxnet.readthedocs.org/en/latest/doxygen) of the document.
+- Each module will only depend on other module by the header files in [include](../../include).
+- The implementation of module is in [src](../../src) folder. 
+- Each source code only sees the file within its folder, [src/common](../../src/common) and [include](../../include).
+
+Most modules are mostly self-contained, with interface dependency on engine.
+So you are free to pick the one you are interested in, and read that part.
+
+### Analogy to CXXNet
+- The Symbolic Execution can be viewed as neural net execution(forward, backprop) with more optimizations.
+- The Operator can be viewed as Layers, but need to pass in weights and bias.
+	- It also contains more(optional) interface to further optimize memory usage.
+- The Symbolic Construction module is advanced config file.
+- The Runtime Dependency Engine engine is like a thread pool.
+	- But makes your life easy to solve dependency tracking for you.
+- KVStore adopts a simple parameter-server interface optimized for GPU synchronization.
+
+### Analogy to Minerva
+- The Runtime Dependency Engine is DAGEngine in Minerva, except that it is enhanced to support mutations.
+- The NDArray is same as owl.NDArray, except that it supports mutation, and can interact with Symbolic Execution.
+
+Documents of Each Module
+------------------------
+* [Runtime Dependency Engine](engine.md)
+* [Operators](operator.md)
+
+
+Open Source Design Notes
+------------------------
+* [Programming Models for Deep Learning](../program_model.md)
+	- Compares various programming models, which motivates the current design.
+
 
 List of Other Resources
 -----------------------
diff --git a/doc/index.md b/doc/index.md
index 866bab6c3e83..6a23fc69563c 100644
--- a/doc/index.md
+++ b/doc/index.md
@@ -13,14 +13,27 @@ User Guide
 * [Python Package Document](python/index.md)
 * [Frequently Asked Questions](faq.md)
 
+
 Developer Guide
 ---------------
-* [Programming Models for Deep Learning](program_model.md)
 * [Developer Documents](developer-guide/index.md)
 * [Environment Variables for MXNet](env_var.md)
 * [Contributor Guideline](contribute.md)
 * [Doxygen Version of C++ API](https://mxnet.readthedocs.org/en/latest/doxygen)
 
+
+Open Source Design Notes
+------------------------
+This section contains the design document and notes we made for mxnet system design and deep learning 
+libraries in general. We believe that open sourcing the system design note, its motivations and choices
+can benefit general audience, for those who uses deep learning and who builds deep learning systems.
+
+This section will be updated with self-contained design notes on various aspect of deep learning systems,
+in terms of abstraction, optimization and trade-offs.
+
+* [Programming Models for Deep Learning](program_model.md)
+
+
 Indices and tables
 ------------------
 
diff --git a/doc/python/index.md b/doc/python/index.md
index 409f8e7d5d70..9914223b3192 100644
--- a/doc/python/index.md
+++ b/doc/python/index.md
@@ -11,6 +11,8 @@ There are three types of documents you can find about mxnet.
 Tutorials
 ---------
 * [Python Overview Tutorial](tutorial.md)
+* [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md)
+
 
 Python API Documents
 --------------------
diff --git a/doc/python/symbol.md b/doc/python/symbol.md
index ac1549906c3e..b153fdb32773 100644
--- a/doc/python/symbol.md
+++ b/doc/python/symbol.md
@@ -7,6 +7,9 @@ MXNet Python Symbolic API
 * [Symbol Object Document](#mxnet.symbol.Symbol) gives API reference to the Symbol Object
 * [Execution API Reference](#execution-api-reference) tell us on what executor can do.
 
+You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md)
+with this document.
+
 How to Compose Symbols
 ----------------------
 The symbolic API provides a way for you to configure the computation graphs.
diff --git a/doc/python/symbol_in_pictures.md b/doc/python/symbol_in_pictures.md
new file mode 100644
index 000000000000..41e2a1b54aba
--- /dev/null
+++ b/doc/python/symbol_in_pictures.md
@@ -0,0 +1,79 @@
+Symbolic Configuration and Execution in Pictures
+================================================
+This is a self-contained tutorial that explains the Symbolic construction and execution in pictures.
+You are recommend to read this together with [Symbolic API](symbol.md).
+
+Compose Symbols
+---------------
+The symbols are description of computation we want to do. The symbolic construction API generates the computation
+graph that describes the need of computation. The following picture is how we compose symbols to describe basic computations.
+
+![Symbol Compose](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/compose_basic.png)
+
+- The [mxnet.symbol.Variable](symbol.md#mxnet.symbol.Variable) function creates argument nodes that represents inputs to the computation.
+- The Symbol is overloaded with basic element-wise arithmetic operations. 
+
+Configure Neural Nets
+---------------------
+Besides fine-grained operations, mxnet also provide a way to perform big operations that is analogy to layers in neural nets.
+We can use these operators to describe a neural net configuration.
+
+![Net Compose](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/compose_net.png)
+
+
+Example of Multi-Input Net
+--------------------------
+The following is an example of configuring multiple input neural nets.
+
+![Multi Input](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/compose_multi_in.png)
+
+
+Bind and Execute Symbol 
+-----------------------
+When we need to execute a symbol graph. We call bind function to bind ```NDArrays``` to the argument nodes
+to get a ```Executor```.
+
+![Bind](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/bind_basic.png)
+
+You can call ```Executor.Forward``` to get the output results, given the binded NDArrays as input.
+
+![Forward](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/executor_forward.png)
+
+
+Bind Multiple Outputs
+---------------------
+You can use [mx.symbol.Group](symbol.md#mxnet.symbol.Group) to group symbols together then bind them to 
+get outputs of both.
+
+![MultiOut](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/executor_multi_out.png)
+
+But always remember, only bind what you need, so system can do more optimizations for you.
+
+
+Calculate Gradient
+------------------
+You can specify gradient holder NDArrays in bind, then call ```Executor.backward``` after ```Executor.forward```
+will give you the corresponding gradients.
+
+![Gradient](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/executor_backward.png)
+
+
+Simple Bind Interface for Neural Nets
+-------------------------------------
+Sometimes it is tedious to pass the argument NDArrays to the bind function. Especially when you are binding a big
+graph like neural nets. [Symbol.simple_bind](symbol.md#mxnet.symbol.Symbol.simple_bind) provides a way to simplify
+the procedure. You only need to specify input data shapes, and the function will allocate the arguments, and bind
+the Executor for you.
+
+![SimpleBind](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/executor_simple_bind.png)
+
+Auxiliary States
+----------------
+Auxiliary states are just like arguments, except that you cannot take gradient of them. These are states that may 
+not be part of computation, but can be helpful to track. You can pass the auxiliary state in the same way as arguments.
+
+![SimpleBind](https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/mxnet/symbol/executor_aux_state.png)
+
+More Information
+----------------
+Please refer to [Symbolic API](symbol.md) and [Python Documentation](index.md).
\ No newline at end of file
diff --git a/doc/python/tutorial.md b/doc/python/tutorial.md
index 14d92c2bb26a..b8e6ad12bd77 100644
--- a/doc/python/tutorial.md
+++ b/doc/python/tutorial.md
@@ -344,6 +344,9 @@ to get the gradient.
 ```
 The [model API](../../python/mxnet/model.py) is a thin wrapper around the symbolic executors to support neural net training.
 
+You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md),
+which provides a detailed explanation of concepts in pictures.
+
 ### How Efficient is Symbolic API
 
 In short, they design to be very efficienct in both memory and runtime.
@@ -357,7 +360,7 @@ utilization.
 The coarse grained operators are equivalent to cxxnet layers, which are
 extremely efficient.  We also provide fine grained operators for more flexible
 composition. Because we are also doing more inplace memory allocation, mxnet can
-be ***more memory efficient*** than cxxnet/caffe, and gets to same runtime, with
+be ***more memory efficient*** than cxxnet, and gets to same runtime, with
 greater flexiblity.
 
 ## Distributed Key-value Store