java 如何从java中的数组创建ARFF文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12953958/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 10:57:18  来源:igfitidea点击:

How to create an ARFF file from an array in java?

javawekaregressionarff

提问by abhishek

I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights all represented by arrays in java?

我想获得由 java 中的两个数组表示的 xy 对的加权线性回归的系数。我已经将重点放在了 weka 上,但它在“LinearRegression”类中询问“Instances”类对象。要创建“实例”类文件,需要包含数据的 ARFF 文件。我遇到过使用 FastVector 类的解决方案,但现在在最新的 weka 版本中已被弃用。如何为 xy 对和相应的权重创建一个 ARFF 文件,这些权重都由 java 中的数组表示?

Here's my code based on Baz's answer. It's giving an exception on the last line "lr.buildClassifier(newDataset)" - Thread [main] (Suspended (exception UnassignedClassException))
Capabilities.testWithFail(Instances) line: 1302 . Here's the code -

这是我基于 Baz 回答的代码。它在最后一行 "lr.buildClassifier(newDataset)" - Thread [main] (Suspended (exception UnassignedClassException))
Capabilities.testWithFail(Instances) line: 1302上给出了一个异常。这是代码 -

public static void test() throws Exception
{
    double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};

    int numInstances = data[0].length;

    ArrayList<Attribute> atts = new ArrayList<Attribute>();
    List<Instance> instances = new ArrayList<Instance>();
    for(int dim = 0; dim < 2; dim++)
    {
        Attribute current = new Attribute("Attribute" + dim, dim);

        if(dim == 0)
        {
            for(int obj = 0; obj < numInstances; obj++)
            {
                instances.add(new SparseInstance(numInstances));
            }
        }

        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.get(obj).setValue(current, data[dim][obj]);
            //instances.get(obj).setWeight(weights[obj]);
        }
        atts.add(current);
    }

    Instances newDataset = new Instances("Dataset", atts, instances.size());

    for(Instance inst : instances)
        newDataset.add(inst);

    LinearRegression lr = new LinearRegression();

    lr.buildClassifier(newDataset);             
}

回答by Baz

I think this might help you:

我认为这可能对您有所帮助:

FastVector atts = new FastVector();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
    // Create new attribute / dimension
    Attribute current = new Attribute("Attribute" + dim, dim);
    // Create an instance for each data object
    if(dim == 0)
    {
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.add(new SparseInstance(numDimensions));
        }
    }

    // Fill the value of dimension "dim" into each object
    for(int obj = 0; obj < numInstances; obj++)
    {
        instances.get(obj).setValue(current, data[dim][obj]);
    }

    // Add attribute to total attributes
    atts.addElement(current);
}

// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());

// Fill in data objects
for(Instance inst : instances)
    newDataset.add(inst);

Afterwards Instancesis you dataset.

之后Instances是你的数据集。

Note: The currentversion (3.6.8) of Weka did not complain, even though I used FastVector.

注意:Weka的当前版本 (3.6.8) 没有抱怨,即使我使用FastVector.

However, for the Developerversion (3.7.7), use this:

但是,对于开发人员版本 (3.7.7),请使用:

ArrayList<Attribute> atts = new ArrayList<Attribute>();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
    Attribute current = new Attribute("Attribute" + dim, dim);
    if(dim == 0)
    {
        for(int obj = 0; obj < numInstances; obj++)
        {
            instances.add(new SparseInstance(numDimensions));
        }
    }

    for(int obj = 0; obj < numInstances; obj++)
    {
        instances.get(obj).setValue(current, data[dim][obj]);
    }

    atts.add(current);
}

Instances newDataset = new Instances("Dataset", atts, instances.size());

for(Instance inst : instances)
    newDataset.add(inst);

回答by The Cat

You want to construct an Instancesobject, that class overrides toString()to output in ARFF format. If FastVector is deprecated you could just use Vector.

您想构造一个Instances对象,该类覆盖toString()以 ARFF 格式输出。如果 FastVector 已弃用,您可以只使用Vector