Machine Learning Week_8 K-means And PCA


目录
  • 1 K-means
    • 1.1 Unsupervised Learning:Introduction
    • 1.2 K-Means Algorithm
      • 1.2.1 k-means algorithm
    • 1.3 Optimization Objective
      • 1.3.1 K-means optimization objective
    • 1.4 Random Initialization
      • 1.4.1 Random initialzation
      • 1.4.2 Code
    • 1.5 Choosing the Number of Clusters
  • 2 PCA
    • 2.1 Motivation I: Data Compression
    • 2.2 Motivation II: Visualization
    • 2.3 Principal Component Analysis Problem Formulation
    • 2.4 Principal Component Analysis Algorithm
    • 2.5 Reconstruction from Compressed Representation
    • 2.6 Choosing the Number of Principal Components
    • 2.7 Advice for Applying PCA

1 K-means

1.1 Unsupervised Learning:Introduction

In this video, I'd like to start to talk about clustering.

This will be exciting, because this is our first unsupervised learning algorithm, where we learn from unlabeled data instead from labelled data.

So, what is unsupervised learning? I briefly talked about unsupervised learning at the beginning of the class but it's useful to contrast it with supervised learning. So, here's a typical supervised learning problem where we're given a labeled training set and the goal is to find the decision boundary that separates the positive label examples and the negative label examples. So, the supervised learning problem in this case is given a set of labels to fit a hypothesis to it.

In contrast, in the unsupervised learning problem we're given data that does not have any labels associated with it. So, we're given data that looks like this. Here's a set of points add in no labels, and so, our training set is written just x1, x2, and so on up to xm and we don't get any labels y. And that's why the points plotted up on the figure don't have any labels with them. So, in unsupervised learning what we do is we give this sort of unlabeled training set to an algorithm and we just ask the algorithm find some structure in the data for us. Given this data set one type of structure we might have an algorithm find is that it looks like this data set has points grouped into two separate clusters.

So an algorithm that finds clusters like the ones I've just circled is called a clustering algorithm. And this would be our first type of unsupervised learning, although there will be other types of unsupervised learning algorithms that we'll talk about later that finds other types of structure or other types of patterns in the data other than clusters. We'll talk about this after we've talked about clustering.

So, what is clustering good for? Early in this class I already mentioned a few applications.

One is market segmentation where you may have a database of customers and want to group them into different marker segments so you can sell to them separately or serve your different market segments better. Social network analysis. There are actually groups have done this things like looking at a group of people's social networks. So, things like Facebook, Google+, or maybe information about who other people that you email the most frequently and who are the people that they email the most frequently and to find coherence in groups of people. So, this would be another maybe clustering algorithm where you know want to find who are the coherent groups of friends in the social network? Here's something that one of my friends actually worked on which is, use clustering to organize computer clusters or to organize data centers better. Because if you know which computers in the data center in the cluster tend to work together, you can use that to reorganize your resources and how you layout the network and how you design your data center communications. And lastly, something that actually another friend worked on using clustering algorithms to understand galaxy formation and using that to understand astronomical data.

So, that's clustering which is our first example of an unsupervised learning algorithm. In the next video we'll start to talk about a specific clustering algorithm.

1.2 K-Means Algorithm

In the clustering problem we are given an unlabeled data set and we would like to have an algorithm automatically group the data into coherent subsets or into coherent clusters for us. The K Means algorithm is by far the most popular, by far the most widely used clustering algorithm.

1.2.1 k-means algorithm

input

  • K (number of clusters)
  • Training set \({x^{(1)}, x^{(2)}, ..., x^{(m)}}\)
  • \(x^{(i)} \in R^n\) (drop \(x_0=1\) convention)

First step:
 Randomly initialize k cluster centroids \(\mu_1, \mu_2,..., \mu_k\)
 \(\mu_k \in R^n\)

Second step:
 repeat
 {
   //cluster assignment step;
   //\(min_k ||x^{(1)}||\)
   for i = 1 to m
    \(c^{(i)}\):= Index(from 1 to k) of cluster centroid closest to x^{(i)}

   //move centroid step
   for k = 1 to k
    // i.e. \(\mu_2 = \frac{1}{4}[x^{(1)} + x^{(2)}+x^{(5)}+x^{(6)}]\)
    \(\mu_k\):= average(mean) of points assigned to clusters

1.3 Optimization Objective

Most of the supervised learning algorithms we've seen, things like linear regression, logistic regression, and so on, all of those algorithms have an optimization objective or some cost function that the algorithm was trying to minimize. It turns out that k-means also has an optimization objective or a cost function that it's trying to minimize. And in this video I'd like to tell you what that optimization objective is.

1.3.1 K-means optimization objective

term
\(c^{(i)}\) = index of cluster (1,2,...,k) to which eample \(x^{(i)}\) is currently assigned
\(\mu_k\) = cluster centroid k
\(\mu_{c^{(i)}}\) = cluster centroid of cluster to which example \(x^{(i)}\) has been assigned

Optimization objective

\[J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_k) = \frac{1}{m}\sum_{i=1}^{m}||x^{(i)}-\mu_{c^{(i)}}||^2 \]

\[min_{c^{(1)},...,c^{(m)},\; \mu_1,...,\mu_k}\;J(...) \]

1.4 Random Initialization

1.4.1 Random initialzation

should have k < m

  1. Randomly pick k training examples.
  2. Set \(\mu_1 ,...,\mu_k\) equal to these k examples

Depending on the random initialzation, k-means can end up at different solutions (End up local optima)

what we can do , is try multiple random initializations, and use that try to make sure we get as good solution.

1.4.2 Code

for i=1 to 100
{
  Randomly initialize K-means
  Run K-means. Get \(c^{(1)},...,c^{(m)},\mu_1 ,...,\mu_k\)
  Compute cost function(distortation)
  \(J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_k) = \frac{1}{m}\sum_{i=1}^{m}||x^{(i)}-\mu_{c^{(i)}}||^2\)
}
Pick clustering that gave lowest cost J.

事实证明 \(K \in {2,3,4,..,10}\) 小聚类时,多次运行会很有好处,一旦中心数过于多,多次运行也不会有很好的优化效果。

1.5 Choosing the Number of Clusters

In this video I'd like to talk about one last detail of K-means clustering which is how to choose the number of clusters, or how to choose the value of the parameter K. To be honest, there actually isn't a great way of answering this. Doing this automatically and by far the most common way of choosing the number of clusters, is still choosing it manually by looking at visualizations or by looking at the output of the clustering algorithm or something else.

自动选择K,那么要多次挑选不同的K值,画出对应的曲线图,让我们更好的理解。像是第一张图片里的 Elbow,有一个很清晰的拐点。但有时你并不能很好的观察到这个拐点,这就是自动选择的缺点。

Sometimes, you're running K-means to get clusters to use for some later/downstrem purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.

2 PCA

2.1 Motivation I: Data Compression

2.2 Motivation II: Visualization

2.3 Principal Component Analysis Problem Formulation

2.4 Principal Component Analysis Algorithm

2.5 Reconstruction from Compressed Representation

2.6 Choosing the Number of Principal Components

2.7 Advice for Applying PCA

相关