Skip to content

trouble with Statsample::Bivariate#correlation_matrix #17

@akchan

Description

@akchan

Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?

I want to do PCA alanysis with very large data. (3000 variables, 50 samples)
Then, I wrote this code.

data_raw = IO.readlines('data1.txt').map{|v| v.split }[1..-1]

hash_tmp = {}

data_raw[1..3000].each do |ary|
  hash_tmp[ary[0]] = ary[1..-1].map(&:to_i).to_scale
end

ds = hash_tmp.to_dataset

puts "Input data done!"

cor_matrix=Statsample::Bivariate.correlation_matrix(ds)

puts "cor_matrix was prepared."

pca=Statsample::Factor::PCA.new(cor_matrix)

binding.pry

But the ruby on my mac doesn't return "Cor_matrix was prepared.".
I wrote another code to investigate a cause of this.

# Opening Class to investigate where is bottleneck
module Statsample
  module Bivariate
    class << self
      def covariance_matrix_optimized(ds)
        x=ds.to_gsl
        n=x.row_size
        m=x.column_size
        puts "calculating means..."
        means=((1/n.to_f)*GSL::Matrix.ones(1,n)*x).row(0)
        puts "centering matrix..."
        centered=x-(GSL::Matrix.ones(n,m)*GSL::Matrix.diag(means))
        puts "calculating covariance matrix..."
        ss=centered.transpose*centered
        puts "calculating n..."
        s=((1/(n-1).to_f))*ss
        puts "done!"              #<= This line has executed
        s
      end



      def correlation_matrix(ds)
        vars,cases=ds.fields.size,ds.cases
        if !ds.has_missing_data? and Statsample.has_gsl? and prediction_optimized(vars,cases) < prediction_pairwise(vars,cases)
          binding.pry
          cm=correlation_matrix_optimized(ds)
          binding.pry             #<= This line hasn't executed. :(
        else
          cm=correlation_matrix_pairwise(ds)
        end
        binding.pry
        cm.extend(Statsample::CovariateMatrix)
        binding.pry
        cm.fields=ds.fields
        binding.pry
        cm
      end
    end
  end
end

Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method.
I haven't seen a Ruby method which doesn't return.

If someone knows a way to solve this problem or investigate cause deeply, please tell me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions