Skip to content

Ensemble Clustering on different data transformations #16

@Kosisochi

Description

@Kosisochi

From the documentation, i can only see capabilities for clustering ensemble for different k or different algorithm algorithms and so on.
What I want to do is to cluster different "views" of the same data. Example: Bag of Words , TFIDF and word embedding representation of the same data as an ensemble.

I have tried creating different data objects of the different data representation.

dataObj = oe.data(df2, list_x)
dataObj1 = oe.data(df3, list_xj)

dataObj.D["parent1"] = dataObj1
c = oe.cluster(dataObj)
K = 5
numIterations = 2
c_MV_arr = []
source_names = ['parent', 'parent1']
output_names = ["BOW", "TFIDF"]
for i in range(1,numIterations):
    name = 'kmeans_' + output_names[i]  
    c.cluster(source_names[i], 'kmeans', name, K, init = 'random', n_init = 1) 
    c_MV_arr.append(c.finish_majority_vote(threshold=0.5))

but i get this error

TypeError: float() argument must be a string or a number, not 'data'

I also tried multiple c's

c = oe.cluster(dataObj)
c1 = oe.cluster(dataObj1)

but i cant calculate c.finish_majority_vote(threshold=0.5) as it only takes c into consideration and not c1

Is it possible to use OpenEnsembles to cluster ensembles of different data?

Note: The feature dimension of df2 and df3 are different and so cannot do the transformation into BOW features or TFIDF features inside the for loop because the length of the features (length of df2.colums or df3.colums) is the second argument for oe.data(df,x) and this is initialized outside the for loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions