java 如何将两组weka实例合并在一起

Question

提问by fodon

Currently, I'm copying one instance at a time from one dataset to the other. Is there a way to do this so that string mappings remain intact? The mergeInstances works horizontally, is there an equivalent vertical merge?

目前，我一次将一个实例从一个数据集复制到另一个数据集。有没有办法做到这一点，以便字符串映射保持完整？mergeInstances 水平工作，是否有等效的垂直合并？

This is one step of a loop I use to read datasets of the same structure from multiple arff files into one large dataset. There has got to be a simpler way.

这是我用来将相同结构的数据集从多个 arff 文件读取到一个大数据集的循环的一个步骤。必须有一个更简单的方法。

Instances iNew = new ConverterUtils.DataSource(name).getDataSet();
for (int i = 0; i < iNew.numInstances(); i++) {
    Instance nInst = iNew.instance(i);
    inst.add(nInst);
}

Answer 1

采纳答案by kaz

Why not make a new ARFF file which has the data from both of the originals? A simple

为什么不制作一个包含两个原始数据的新 ARFF 文件？一个简单的

cat 1.arff > tmp.arff
tail -n+20 2.arff >> tmp.arff

where 20is replaced by however many lines long your arff header is. This would then produce a new arff file with all of the desired instances, and you could read this new file with your existing code:

where20被替换为你的 arff 标头有多长。这将生成一个包含所有所需实例的新 arff 文件，您可以使用现有代码读取这个新文件：

Instances iNew = new ConverterUtils.DataSource(name).getDataSet();

You could also invoke weka on the command line using this documentation: http://old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data-list--td22890856.html

您还可以使用以下文档在命令行上调用 weka：http: //old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data -list--td22890856.html

java weka.core.Instances append filename1 filename2 > output-file

However, there is no function in the documentation http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main%28java.lang.Stringwhich will allow you to append multiple arff files natively within your java code. As of Weka 3.7.6, the code that appends two arff files is this:

但是，文档http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main%28java.lang.String 中没有允许您在本地添加多个 arff 文件的功能爪哇代码。从 Weka 3.7.6 开始，附加两个 arff 文件的代码是这样的：

     // read two files, append them and print result to stdout
  else if ((args.length == 3) && (args[0].toLowerCase().equals("append"))) {
DataSource source1 = new DataSource(args[1]);
DataSource source2 = new DataSource(args[2]);
String msg = source1.getStructure().equalHeadersMsg(source2.getStructure());
if (msg != null)
  throw new Exception("The two datasets have different headers:\n" + msg);
Instances structure = source1.getStructure();
System.out.println(source1.getStructure());
while (source1.hasMoreElements(structure))
  System.out.println(source1.nextElement(structure));
structure = source2.getStructure();
while (source2.hasMoreElements(structure))
  System.out.println(source2.nextElement(structure));
  }

Thus it looks like Weka itself simply iterates through all of the instances in a data set and prints them, the same process your code uses.

因此，看起来 Weka 本身只是简单地遍历数据集中的所有实例并打印它们，这与您的代码使用的过程相同。

Answer 2

回答by mountrix

If you want a totally fully automated method that also copy properly string and nominal attributes, you can use the following function:

如果您想要一个完全自动化的方法，同时正确复制字符串和名义属性，您可以使用以下函数：

public static Instances merge(Instances data1, Instances data2)
    throws Exception
{
    // Check where are the string attributes
    int asize = data1.numAttributes();
    boolean strings_pos[] = new boolean[asize];
    for(int i=0; i<asize; i++)
    {
        Attribute att = data1.attribute(i);
        strings_pos[i] = ((att.type() == Attribute.STRING) ||
                          (att.type() == Attribute.NOMINAL));
    }

    // Create a new dataset
    Instances dest = new Instances(data1);
    dest.setRelationName(data1.relationName() + "+" + data2.relationName());

    DataSource source = new DataSource(data2);
    Instances instances = source.getStructure();
    Instance instance = null;
    while (source.hasMoreElements(instances)) {
        instance = source.nextElement(instances);
        dest.add(instance);

        // Copy string attributes
        for(int i=0; i<asize; i++) {
            if(strings_pos[i]) {
                dest.instance(dest.numInstances()-1)
                    .setValue(i,instance.stringValue(i));
            }
        }
    }

    return dest;
}

Please note that the following conditions should hold (there are not checked in the function):

请注意，应满足以下条件（函数中未检查）：

Datasets must have the same attributes structure (number of attributes, type of attributes)
Class index has to be the same
Nominal values have to exactly correspond

数据集必须具有相同的属性结构（属性数量、属性类型）
类索引必须相同
标称值必须完全对应

To modify on the fly the values of the nominal attributes of data2 to match the ones of data1, you can use:

要动态修改 data2 的名义属性的值以匹配 data1 的值，您可以使用：

data2.renameAttributeValue(
    data2.attribute("att_name_in_data2"),
    "att_value_in_data2",
    "att_value_in_data1");

Answer 3

回答by user2402105

Another possible solution is to use addAll from java.util.AbstractCollection, since Instances implement it.

另一种可能的解决方案是使用 java.util.AbstractCollection 中的 addAll，因为实例实现了它。

instances1.addAll(instances2);

Answer 4

回答by btaranta

I've just shared an extended weka.core.Instacesclass with methods like innerJoin, leftJoin, fullJoin, updateand union.

我刚刚共享的扩展weka.core.Instaces类等的方法innerJoin，leftJoin，fullJoin，update和union。

table1.makeIndex(table1.attribute("Continent_ID");
table2.makeIndex(table2.attribute("Continent_ID");
Instances result = table1.leftJoin(table2);

Instances can have different number of attributes, levels of NOMINALand STRINGvariables are merged together if neccesary.

如果需要，实例可以具有不同数量的属性、级别NOMINAL和STRING变量合并在一起。

Sources and some examples are here on GitHub: weka.join.

来源和一些示例在 GitHub 上：weka.join。

java 如何将两组weka实例合并在一起

提问by fodon

采纳答案by kaz

回答by mountrix

回答by user2402105

回答by btaranta

相关推荐

最近更新

标签

java 如何将两组weka实例合并在一起

提问by fodon

采纳答案by kaz

回答by mountrix

回答by user2402105

回答by btaranta

相关推荐

java 使用 HtmlUnit 登录

java 什么是android java中的会话变量？

java 如何将 ComboBox.SelectedItem() 转换为 int？

Java RMI简单的Hello World程序抛出RemoteException

相关推荐

最近更新

标签