org.datamanager.clustering.matrix
Class DocumentTermMatrixImpl

java.lang.Object
  |
  +--org.datamanager.clustering.matrix.DocumentTermMatrixImpl
All Implemented Interfaces:
DocumentTermMatrix

public class DocumentTermMatrixImpl
extends Object
implements DocumentTermMatrix

A basic DocumentTermMatrix implementation.


Field Summary
static int COLUMN_GROWTH_MULTIPLIER
          How fast we grow columns when the matrix fills up.
static int MAXIMUM_TERMS_INDEXED
          Maximum number of terms we will index.
static int MINIMUM_WORD_LENGTH
          How long a word must be to be added to the matrix.
static int ROW_GROWTH_MULTIPLIER
          How fast we grow rows when the matrix fills up.
 
Method Summary
 void addWords(Entity entity, WordFrequencyMapEntityValue newWords)
          Adds words from the provided WordFrequencyMapEntityValue to the document term matrix.
static DocumentTermMatrixImpl getDocumentTermMatrix()
          Gets the singleton instance of this DocumentTermMatrix.
protected static DoubleMatrix2D getMatrix()
          Gets the document x term matrix.
 double getSimilarityBetween(Entity entityOne, Entity entityTwo)
          Gets the similarity between two entities by doing a LSA look up for documents documents clustering matrix.
 void makeDocumentByDocumentMatrix()
          Creates a docment by document matrix representing similarity distances by creating the Singular Value Decomposition of the document by term matrix.
 ClusterableMatrix toClusterableMatrix()
          Returns a ClusterableMatrix of the documents in this matrix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MINIMUM_WORD_LENGTH

public static final int MINIMUM_WORD_LENGTH
How long a word must be to be added to the matrix.

See Also:
Constant Field Values

ROW_GROWTH_MULTIPLIER

public static final int ROW_GROWTH_MULTIPLIER
How fast we grow rows when the matrix fills up.

See Also:
Constant Field Values

COLUMN_GROWTH_MULTIPLIER

public static final int COLUMN_GROWTH_MULTIPLIER
How fast we grow columns when the matrix fills up.

See Also:
Constant Field Values

MAXIMUM_TERMS_INDEXED

public static final int MAXIMUM_TERMS_INDEXED
Maximum number of terms we will index.

See Also:
Constant Field Values
Method Detail

toClusterableMatrix

public ClusterableMatrix toClusterableMatrix()
Description copied from interface: DocumentTermMatrix
Returns a ClusterableMatrix of the documents in this matrix.

Specified by:
toClusterableMatrix in interface DocumentTermMatrix

addWords

public void addWords(Entity entity,
                     WordFrequencyMapEntityValue newWords)
Adds words from the provided WordFrequencyMapEntityValue to the document term matrix.

Specified by:
addWords in interface DocumentTermMatrix

makeDocumentByDocumentMatrix

public void makeDocumentByDocumentMatrix()
Creates a docment by document matrix representing similarity distances by creating the Singular Value Decomposition of the document by term matrix. FIXME: this is incomplete. See http://ella.slis.indiana.edu/~katy/L697/code/lsa.html


getDocumentTermMatrix

public static DocumentTermMatrixImpl getDocumentTermMatrix()
Gets the singleton instance of this DocumentTermMatrix.


getMatrix

protected static DoubleMatrix2D getMatrix()
Gets the document x term matrix.


getSimilarityBetween

public double getSimilarityBetween(Entity entityOne,
                                   Entity entityTwo)
Gets the similarity between two entities by doing a LSA look up for documents documents clustering matrix.

Specified by:
getSimilarityBetween in interface DocumentTermMatrix


See the Helium Website