Module feast

Module feast

source code


The FEAST module provides an interface between the C-library
for feature selection to Python. 

References: 
1) G. Brown, A. Pocock, M.-J. Zhao, and M. Lujan, "Conditional
    likelihood maximization: A unifying framework for information
    theoretic feature selection," Journal of Machine Learning 
    Research, vol. 13, pp. 27-66, 2012.


Version: 0.2.0

Author: Calvin Morrison

Copyright: Copyright 2013, EESI Laboratory

License: GPL

Functions
list
BetaGamma(data, labels, n_select, beta=1.0, gamma=1.0)
This algorithm implements conditional mutual information feature select, such that beta and gamma control the weight attached to the redundant mutual and conditional mutual information, respectively.
source code
list
CIFE(data, labels, n_select)
This function implements the Condred feature selection algorithm.
source code
list
CMIM(data, labels, n_select)
This function implements the conditional mutual information maximization feature selection algorithm.
source code
 
CondMI(data, labels, n_select)
This function implements the conditional mutual information maximization feature selection algorithm.
source code
list
Condred(data, labels, n_select)
This function implements the Condred feature selection algorithm.
source code
list
DISR(data, labels, n_select)
This function implements the double input symmetrical relevance feature selection algorithm.
source code
list
ICAP(data, labels, n_select)
This function implements the interaction capping feature selection algorithm.
source code
list
JMI(data, labels, n_select)
This function implements the joint mutual information feature selection algorithm.
source code
list
MIFS(data, labels, n_select)
This function implements the MIFS algorithm.
source code
list
MIM(data, labels, n_select)
This function implements the MIM algorithm.
source code
list
mRMR(data, labels, n_select)
This funciton implements the max-relevance min-redundancy feature selection algorithm.
source code
tuple
check_data(data, labels)
Check dimensions of the data and the labels.
source code
Variables
  __credits__ = ['Calvin Morrison', 'Gregory Ditzler']
  __maintainer__ = 'Calvin Morrison'
  __email__ = 'mutantturkey@gmail.com'
  __status__ = 'Release'
  libFSToolbox = <CDLL 'libFSToolbox.so', handle 2be1240 at 2b4b...
  __package__ = None
Function Details

BetaGamma(data, labels, n_select, beta=1.0, gamma=1.0)

source code 

This algorithm implements conditional mutual information feature select, such that beta and gamma control the weight attached to the redundant mutual and conditional mutual information, respectively.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features (REQUIRED)
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations. (REQUIRED)
  • n_select (integer) - number of features to select. (REQUIRED)
  • beta (float between 0 and 1.0) - penalty attacted to I(X_j;X_k)
  • gamma (float between 0 and 1.0) - positive weight attached to the conditional redundancy term I(X_k;X_j|Y)
Returns: list
features in the order they were selected.

CIFE(data, labels, n_select)

source code 

This function implements the Condred feature selection algorithm. beta = 1; gamma = 1;

Parameters:
  • data (ndarray) - A Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select.
Returns: list

CMIM(data, labels, n_select)

source code 

This function implements the conditional mutual information maximization feature selection algorithm. Note that this implementation does not allow for the weighting of the redundancy terms that BetaGamma will allow you to do.

Parameters:
  • data (ndarray) - A Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy array with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select.
Returns: list
features in the order that they were selected.

CondMI(data, labels, n_select)

source code 

This function implements the conditional mutual information maximization feature selection algorithm.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select.
Returns:
features in the order they were selected. @rtype list

Condred(data, labels, n_select)

source code 

This function implements the Condred feature selection algorithm. beta = 0; gamma = 1;

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select.
Returns: list
the features in the order they were selected.

DISR(data, labels, n_select)

source code 

This function implements the double input symmetrical relevance feature selection algorithm.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

ICAP(data, labels, n_select)

source code 

This function implements the interaction capping feature selection algorithm.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

JMI(data, labels, n_select)

source code 

This function implements the joint mutual information feature selection algorithm.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

MIFS(data, labels, n_select)

source code 

This function implements the MIFS algorithm. beta = 1; gamma = 0;

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

MIM(data, labels, n_select)

source code 

This function implements the MIM algorithm. beta = 0; gamma = 0;

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

mRMR(data, labels, n_select)

source code 

This funciton implements the max-relevance min-redundancy feature selection algorithm.

Parameters:
  • data (ndarray) - data in a Numpy array such that len(data) = n_observations, and len(data.transpose()) = n_features
  • labels (ndarray) - labels represented in a numpy list with n_observations as the number of elements. That is len(labels) = len(data) = n_observations.
  • n_select (integer) - number of features to select. (REQUIRED)
Returns: list
the features in the order they were selected.

check_data(data, labels)

source code 

Check dimensions of the data and the labels. Raise and exception if there is a problem.

Data and Labels are automatically cast as doubles before calling the feature selection functions

Parameters:
  • data - the data
  • labels - the labels
Returns: tuple

Variables Details

libFSToolbox

Value:
<CDLL 'libFSToolbox.so', handle 2be1240 at 2b4bc10>