Big Data

Fall Semester 2014

Abstract. One of the key challenges of the information society is to turn data into information, information into knowledge, and knowledge into value. To turn data into value in this way involves collecting large volumes of data, possibly from many and diverse data sources, processing the data fast, and applying complex operations to the data.

This combination of requirements is typically referred to as Big Data and it has led to a completely new way to do business (e.g., develop new products and business models) and do science (sometimes referred to as data-driven science or the "fourth paradigm"). Unfortunately, big data grows faster than our ability to process the data so that new architectures and approaches for processing Big Data are needed.

The goal of this course is to give an overview of Big Data technologies. All aspects are covered: data formats and models, analysis techniques and tools, systems, and applications.

Course Catalogue Info

Course Overview

WeekDateTopicSlidesExercises
1 09/16/2014 Introduction Overview
Introduction
Class Projects
2 09/23/2014 Big Data Management: Data Warehousing DataWarehousing
3 09/30/2014 ... Big Data Stores I Storage Exercise 1
S1
4 10/07/2014 ... Big Data Stores II Exercise 2
S2
5 10/14/2014 Big Data Computation: Map Reduce Computation Tutorials:
spark
elastic MR
6 10/21/2014 ... Spark & Hadoop Exercise 3
S3
7 10/28/2014 ... Open Source Ecosystem, Analytics
8 11/04/2014 Streaming Data: Data Management, Programming Streaming Exercise 4
S4
9 11/11/2014 ... Sampling
10 11/18/2014 ... Sketching Exercise 5
S5
11 11/25/2014 Big Data in the Real World: Use cases
12 12/02/2014 Semistructured Data: Syntax, Data models & Querying Syntax
Data Models
13 12/09/2014 Document stores Querying
Document stores
no exercise session
14 12/16/2014 Graph databases, Q&A, projects best-of

Course Projects

MilestoneDue onDescription
1 Sep. 24 Choose a question
2 Oct. 28 Small data proof of concept
3 Nov. 25 Scalable implementation
4 Dec. 15 Final report

Schedule

DayTimeRoom
Tuesday 10 - 12 h CAB G 51
Wednesday 13 - 14 h ML F 36

Exercise groups (starting sept 24)

GroupDayTimeRoomTutor
1 Wednesday 14 - 15 h ML F 36 Besmira Nushi
2 Wednesday 14 - 15 h ML F 40 Andrei Dan
3 Friday 14 - 15 h CHN D 46 Martin Jaggi

Contact

Professor Thomas Hofmann
Guest Lecturers Ghislain Fourny, Bart Samwel, Alex Hall, Kevin Mader
Organizing Assistant Martin Jaggi
Teaching Assistants Besmira Nushi, Andrei Dan

News

02/02/2015 corrected typos in lecture slides: slides 86,94 of computation, slides 64,95 of streaming
01/27/2015 example exam questions are available
01/22/2015 Jan 22nd - minor corrections in lecture slides: slide 58 of computation, slides 17 (Misra-Gries) and 99 of streaming
01/04/2015 Jan 4th - solution slides for ex 5 uploaded
12/29/2014 new slides on graph databases
12/11/2014 exercise sheet 5 published
12/09/2014 new slides on semistructured data
12/01/2104 projects milestone 4 published
11/24/2014 new slides on computing with data streams
11/15/2014 added hint about proof technique in exercise sheet 4
11/11/2014 new slides on computing with data streams
11/05/2014 exercise sheet 4 published
11/05/2014 milestone 3 description published
11/04/2014 new slides on big data computation
10/28/2014 new slides on big data computation
10/23/2014 exercise sheet 3 updated
10/22/2014 exercise sheet 3 published
10/21/2014 new slides on big data computation
10/15/2014 practical tutorial slides on spark and elastic mapreduce on amazon EC2
10/14/2014 new slides on big data computation
10/08/2014 updated slides on big data storage
10/08/2014 exercise sheet 2 published
10/07/2014 updated slides on big data storage
10/01/2014 exercise sheet 1 published

Literature

Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeff Ullman, available for download at mmds.org

Thanks

We thank Amazon for generously supporting us by an AWS in Education Grant award.