graphic with four colored squares Semi-Supervised Learning

Introduction

Supervised Learning

Given a labeled set of empirical observations:

Supervised Learning

Examples

Supervised Learning

Examples

Supervised Learning

Examples

Supervised Learning

Examples

Supervised Learning

Examples

Supervised Learning

Examples

Supervised Learning

Classification

Overview of Problems Addressed

Word Sense Disambiguation

Text Classification

Word Sense Disambiguation

Problem Characterization

Word Sense Disambiguation

Linguistic Ambiguity Subproblems:

Overview of Problems Addressed

Text Classification

Introducing Semi-Supervised Learning

Motivation behind Semi-Supervised Learning

Yarowski Method

NLP Task: Word Sense Disambiguation

Motivating Observations

  1. Contextual cues (nearby words) provide helpful clues
    • Term: "Plant"
    • "...keep a manufacturing plant profitable..."
    • "...keep a perennial plant watered..."
  2. Uses tend to be consistent within a single discourse
    • "Special Report: Plants of the Midwest"
  3. Polysemes are still a problem; plant/N and plant/VT are obviously correlated
  4. 1. above should trump 2.
  5. 1. and 2. together typically over-determine the sense of a word.

Yarowski Method

NLP Task: Word Sense Disambiguation

One-Sense-Per-Discourse Hypothesis

WordSensesAccuracyApplicability
plantliving/factory99.872.8
tankvehicle/container99.650.5
poachsteal/boil10044.4
palmtree/hand99.838.5
axesgrid/tools10035.5
sakebenefit/drink10033.7
bassfish/music10058.8
spacevolume/outer99.267.7
motionlegal/physical99.949.8
cranebird/machine10049.1
Average99.850.1

Word Sense Disambiguation

One-Sense-Per-Collocation Hypothesis

Consider the following decision list:

For A. plant (species) vs. B. plant (manufacturing):

Can you classify the meanings?

  1. plant growth
  2. car (within +/-k words)
  3. plant height
  4. union (within +/-k words)
  5. equipment (within +/-k words)
  6. assembly plant
  7. nuclear plant
  8. flower (within +/-k words)
  9. job (within +/-k words)
  10. fruit (within +/-k words)
  11. plant species

Yarowski Method

NLP Task: Word Sense Disambiguation

Algorithm

  1. Store contexts of all examples of a given polysemous word in an untagged training set.
  2. For each possible sense, select a small number of representative training examples. Those not selected are called the residual
    1. Train a supervised learner on this subset.
    2. Apply the learned classifier to the sample set, and store those points that are classified above a confidence threshold.
    3. Optionially apply the one-sense-per-discourse constraint to filter/augment.
    4. Repeat all of Step 3.
  3. Stop when model converges (residual set stabilizes).

Yarowski Method

NLP Task: Word Sense Disambiguation

Algorithm Progression

Blum & Mitchell Method

NLP Task: Text Classification

Motivating Observations

  1. Redundant representations/views provide multiple feature spaces
    • Lexical Content of a web page
    • Lexical Content of links to a web page
  2. Can we utilize multiple learners focusing on separate feature spaces of the same data to bootstrap one another?
  3. Algorithm: Repeat the following procedure
    1. Train two classifiers, one per view, on training data
    2. Add strongly-classified unlabelled data from each classifier to training data for other classifier.
    3. Stop when both classify the same unlabelled data equally well (by some statistically significant stopping criteria)
  4. The idea is that the two classifiers will vary in the confidence with which they classify individual data points
  5. We can use this variation to make the classifiers bootstrap one another to grow the training data from unlabelled points.

Joachims Method

NLP Task: Text Classification

Transductive Learning: Slightly different learning paradigm

  1. Reasoning directly from observed training cases to test cases
  2. Cluster the unlabelled data (a la unsupervised)
  3. Use the clusters to reason about classes in labelled data
  4. Intermediate model of underlying process (function) skipped
  5. Motivation: Solve your specific problem, not a more general one.

Joachims Method

NLP Task: Text Classification

Understanding the generalized framework

  1. Leave-one-out error: Cross Validation using a single datum for model validation.
  2. Tells us how well the model generalizes across all possible known data.
  3. To devise a Semi-Supervised Scheme:
    • Train a learner on training data
    • Augment with unlabelled data that retains minimal leave-one-out error
    • Thus, we have a property/metric for deciding what unlabelled data to utilize
    • Should improve self-consistency of overall learner over time, without sacrificing
    • Underlying Assumption (Caveat): Training data is highly self-consistent and generalizes well. Highly sensitive to training data, since minutiae will be magnified.

Joachims Method

Graph conceptualization of our data

  1. Nodes are data points
  2. Edges are similarities
  3. A is the adjacency matrix given by:
  4. Goal: Find largest y for yTAy
  5. At solution y*:
    • Let G- be the set of nodes where yi=-1
    • Let G+ be the set of nodes where yi=1
    • cut(G+,G-) = Sum of all edge weights across the cut.
  6. Amounts to s-t mincut, where we are minimizing the sum of the edge weights cut through

Joachims Method

Spectral Graph Transducer (SGT)

  1. Compute diagonal degree matrix B s.t. (bii) = Sumj Aij
  2. Compute Laplacian L = B - A (or normalized Laplacian L = B-1(B-A)
  3. Compute the smallest 2 to d+1 eigenvalues and eigenvectors of L; store them in D and V, respectively
  4. To normalize the spectrum of the graph, replace eigenvals in D with a monotonically increasing function (e.g., Dii = i2
  5. For each new training set:

Joachims Method

Evaluation

TaskSGTkNNTSVMSVM
Optdigits83.462.461.561.6
Reuters5746.751.549.1
Isolet55.447.4-46.3
Adult54.54647.346.2
Ionosphere79.676.780.680.6
Alejandro J. C De Baca