Syllabus for the Spring 2015 Data Engineering class at CU Boulder by Prof. Ken Anderson
This course will investigate the software engineering issues involved with creating data-intensive software systems. We will look at how to support the entire data life cycle including collection, storage, analysis, reporting, and visualization and what tools and techniques are available for each of these stages. Students will help to create some of the content for this class by spiking on various technologies (either individually or in teams) and reporting back what they learned to the class as a whole. Students will work in teams to develop a prototype system that can support the entire data life cycle.
Students need to be willing to learn new frameworks quickly and be willing to apply what they have learned to practical problems of data cleaning, manipulation, and analysis. I'm assuming that students have software engineering skills and are comfortable writing code in multiple programming languages. I expect that we'll be reading/writing code in at least Java, Ruby, Python, and Javascript as we look at a variety of tools and frameworks in the "big data" space.
-
Students should have a laptop with them for every class. Most class sessions will involve hands-on coding or editing of source code, Markdown files, wikis, etc.
- If you don't have a laptop then you'll need to find someone to work with during the class period.
-
A version of the ruby programming language should be installed on your laptop. Any version of ruby 2.x.x should work fine. (For instance, Prof. Anderson has 2.1.2 on his machine; 2.2.0 is the latest version of Ruby as of January 2015.)
-
A version of node should also be installed on your machine. If you have a Mac, the easiest way to install node is via Homebrew. Simply install Homebrew and then enter the command
brew install node. The latest verion of node as of January 2015 is 0.10.35. -
A version of curl should be installed on your machine. curl is a very useful utility for interacting with web services. If you have a Mac, curl should already be installed in
/usr/bin. For other platforms, head over to (the curl website)[http://curl.haxx.se/download.html] to download and install the software. -
Finally, you should be comfortable invoking the developer tools for your favorite web browser. Within Chrome, simply open a window and then invoke
View->Developer->Developer Tools. -
All of these instructions will be reviewed during the first day of lecture.
We will be using the Git version control system heavily this semester. I will be covering Git in class but there are plenty of excellent resources on the web to learn more. In particular, the company that creates the application Tower, has an excellent set of resources for learning Git.
We will also be making heavy use of GitHub and so I'll be covering it as well in class. As with Git, there are many resources available on-line to get up to speed with GitHub's features. Here are some pointers to get started: