Project descriptions

Collaborative Filtering

A recommender system is concerned with presenting items (e.g. books on Amazon, movies at Movielens or music at lastFM) that are likely to interest the user. In collaborative filtering, we base our recommendations on the (known) preference of the user towards other items, and also take into account the preferences of other users.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-collab-filtering-2017

Training Data

For this problem, we have acquired ratings of 10000 users for 1000 different items. All ratings are integer values between 1 and 5 stars.

Evaluation Metrics

Your collaborative filtering algorithm is evaluated according to the following weighted criteria:


Text Sentiment Classification

The use of microblogging and text messaging as a media of communication has greatly increased over the past 10 years. Such large volumes of data amplifies the need for automatic methods to understand the opinion conveyed in a text.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-text-classification-2017

Training Data

For this problem, we have acquired 2.5M tweets classified as either positive or negative.

Evaluation Metrics

Your approach is evaluated according to the following criteria:


Road Segmentation

Segmenting an image consists in partitioning an image into multiple segments (formally one has to assign a class label to each pixel). A simple baseline is to partition an image into a set of patches and classify every patch according to some simple features (average intensity). Although this can produce reasonable results for simple images, natural images typically require more complex procedures that reason abut the entire image or very large windows.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-road-segmentation-2017

Training Data

For this problem, we provide 100 aerial images acquired from GoogleMaps. We also provide groundtruth images where each pixel is labeled as {road, background}.

Evaluation Metrics

Your approach is evaluated according to the following criteria:

Galaxy Image Generation

The goal is to train a generative model that can generate images of galaxies observed by astronomical telescopes.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-cosmology-2017

Training Data

For this problem, we provide 100 astronomy images. We also provide groundtruth labels where each image is labeled as {cosmology, corrupted, background}

Evaluation Metrics

Your approach is evaluated according to the following criteria:



Computational infrastructure

See instructions here

Important information for Azure users: In any public cloud, you pay a price per minute of use of the machine. A GPU machine is therefore quite expensive and cost around 1.6$ per hour, so if you don't shut down the machine in the portal, you will go over budget in less than a week. (see for details). So when you are not using a machine, you should turn it off on the azure portal, this way it won't cost anything.