java 将元数据存储到 Jackrabbit 存储库中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5155764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Store metadata into Hymanrabbit repository
提问by lisak
can anybody explain to me, how to proceed in following scenario ?
任何人都可以向我解释,如何在以下情况下进行?
receiving documents (MS docs, ODS, PDF)
Dublic core metadata extraction via Apache Tika + content extraction via Hymanrabbit-content-extractors
using Hymanrabbit to store documents (content) into repository together with their metadata?
retrieving documents + metadata
接收文件(MS 文档、ODS、PDF)
通过 Apache Tika 提取双核心元数据 + 通过 Hymanrabbit-content-extractors 提取内容
使用 Hymanrabbit 将文档(内容)与其元数据一起存储到存储库中?
检索文档 + 元数据
I'm interested in points 3 and 4 ...
我对第 3 点和第 4 点感兴趣......
DETAILS: The application is processing documents interactively (some analysis - language detection, word count etc. + gather as many details possible - Dublin core + parsing the content/events handling) so that it returns results of the processing to the user and then the extracted content and metadata(extracted and custom user metadata) stores into JCR repository
详细信息:应用程序以交互方式处理文档(一些分析 - 语言检测、字数统计等 + 收集尽可能多的细节 - Dublin 核心 + 解析内容/事件处理),以便将处理结果返回给用户,然后提取的内容和元数据(提取的和自定义的用户元数据)存储到 JCR 存储库中
Appreciate any helps, thank you
感谢任何帮助,谢谢
回答by Randall Hauch
Uploading files is basically the same for JCR 2.0 as it is for JCR 1.0. However, JCR 2.0 adds a few additional built-in property definitions that are useful.
JCR 2.0 和 JCR 1.0 的上传文件基本相同。但是,JCR 2.0 添加了一些额外的有用的内置属性定义。
The "nt:file" node type is intended to represent a file and has two built-in property definitions in JCR 2.0 (both of which are auto-created by the repository when nodes are created):
“nt:file”节点类型旨在表示一个文件,并且在 JCR 2.0 中有两个内置属性定义(这两个都是在创建节点时由存储库自动创建的):
- jcr:created (DATE)
- jcr:createdBy (STRING)
- jcr:创建(日期)
- jcr:createdBy (STRING)
and defines a single child named "jcr:content". This "jcr:content" node can be of any node type, but generally speaking all information pertaining to the content itself is stored on this child node. The de facto standard is to use the "nt:resource" node type, which has these properties defined:
并定义了一个名为“jcr:content”的孩子。这个“jcr:content”节点可以是任何节点类型,但一般来说,与内容本身有关的所有信息都存储在这个子节点上。事实上的标准是使用“nt:resource”节点类型,它定义了以下属性:
- jcr:data (BINARY) mandatory
- jcr:lastModified (DATE) autocreated
- jcr:lastModifiedBy (STRING) autocreated
- jcr:mimeType (STRING) protected?
- jcr:encoding (STRING) protected?
- jcr:data (BINARY) 强制
- jcr:lastModified (DATE) 自动创建
- jcr:lastModifiedBy (STRING) 自动创建
- jcr:mimeType (STRING) 受保护?
- jcr:编码(字符串)保护?
Note that "jcr:mimeType" and "jcr:encoding" were added in JCR 2.0.
请注意,“jcr:mimeType”和“jcr:encoding”是在 JCR 2.0 中添加的。
In particular, the purpose of the "jcr:mimeType" property was to do exactly what you're asking for - capture the "type" of the content. However, the "jcr:mimeType" and "jcr:encoding" property definitions can be defined (by the JCR implementation) as protected (meaning the JCR implementation automatically sets them) - if this is the case, you would not be allowed to manually set these properties. I believe that Hymanrabbitand ModeShapedo not treat these as protected.
特别是,“jcr:mimeType”属性的目的是完全按照您的要求执行 - 捕获内容的“类型”。但是,“jcr:mimeType”和“jcr:encoding”属性定义可以(由 JCR 实现)定义为受保护的(意味着 JCR 实现会自动设置它们)——如果是这种情况,您将不能手动设置这些属性。我相信Hymanrabbit和ModeShape不会将这些视为受保护的。
Here is some code that shows how to upload a file into a JCR 2.0 repository using these built-in node types:
下面是一些代码,展示了如何使用这些内置节点类型将文件上传到 JCR 2.0 存储库:
// Get an input stream for the file ...
File file = ...
InputStream stream = new BufferedInputStream(new FileInputStream(file));
Node folder = session.getNode("/absolute/path/to/folder/node");
Node file = folder.addNode("Article.pdf","nt:file");
Node content = file.addNode("jcr:content","nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
content.setProperty("jcr:data",binary);
And if the JCR implementation does not treat the "jcr:mimeType" property as protected (i.e., Hymanrabbit and ModeShape), you'd have to set this property manually:
如果 JCR 实现未将“jcr:mimeType”属性视为受保护(即 Hymanrabbit 和 ModeShape),则必须手动设置此属性:
content.setProperty("jcr:mimeType","application/pdf");
Metadata can very easily be stored on the "nt:file" and "jcr:content" nodes, but out-of-the-box the "nt:file" and "nt:resource" node types don't allow for extra properties. So before you can add other properties, you first need to add a mixin (or multiple mixins) that have property definitions for the kinds of properties you want to store. You can even define a mixin that would allow any property. Here is a CND file defining such a mixin:
元数据可以很容易地存储在“nt:file”和“jcr:content”节点上,但开箱即用的“nt:file”和“nt:resource”节点类型不允许额外的属性. 因此,在添加其他属性之前,您首先需要添加一个 mixin(或多个 mixin),这些 mixin 具有您要存储的属性类型的属性定义。您甚至可以定义一个允许任何属性的 mixin。这是一个定义这样一个 mixin 的 CND 文件:
<custom = 'http://example.com/mydomain'>
[custom:extensible] mixin
- * (undefined) multiple
- * (undefined)
After registering this node type definition, you can then use this on your nodes:
注册此节点类型定义后,您可以在您的节点上使用它:
content.addMixin("custom:extensible");
content.setProperty("anyProp","some value");
content.setProperty("custom:otherProp","some other value");
You could also define and use a mixin that allowed for any Dublin Core element:
您还可以定义和使用允许任何都柏林核心元素的混合:
<dc = 'http://purl.org/dc/elements/1.1/'>
[dc:metadata] mixin
- dc:contributor (STRING)
- dc:coverage (STRING)
- dc:creator (STRING)
- dc:date (DATE)
- dc:description (STRING)
- dc:format (STRING)
- dc:identifier (STRING)
- dc:language (STRING)
- dc:publisher (STRING)
- dc:relation (STRING)
- dc:right (STRING)
- dc:source (STRING)
- dc:subject (STRING)
- dc:title (STRING)
- dc:type (STRING)
All of these properties are optional, and this mixin doesn't allow for properties of any name or type. I've also not really addressed with this 'dc:metadata' mixin the fact that some of these are already represented with the built-in properties (e.g., "jcr:createBy", "jcr:lastModifiedBy", "jcr:created", "jcr:lastModified", "jcr:mimeType") and that some of them may be more related to content while others more related to the file.
所有这些属性都是可选的,并且这个 mixin 不允许任何名称或类型的属性。我也没有真正解决这个 'dc:metadata' mixin,因为其中一些已经用内置属性表示(例如,“jcr:createBy”、“jcr:lastModifiedBy”、“jcr:created” , "jcr:lastModified", "jcr:mimeType") 并且其中一些可能与内容更相关,而另一些可能与文件更相关。
You could of course define other mixins that better suit your metadata needs, using inheritance where needed. But be careful using inheritance with mixins - since JCR allows a node to multiple mixins, it's often best to design your mixins to be tightly scoped and facet-oriented (e.g., "ex:taggable", "ex:describable", etc.) and then simply apply the appropriate mixins to a node as needed.
您当然可以定义更适合您的元数据需求的其他 mixin,在需要的地方使用继承。但是在使用 mixin 继承时要小心——因为 JCR 允许一个节点有多个 mixin,通常最好将你的 mixin 设计为紧密作用域和面向方面的(例如,“ex:taggable”、“ex:describable”等)然后根据需要简单地将适当的混合应用到节点。
(It's even possible, though much more complicated, to define a mixin that allows more children under the "nt:file" nodes, and to store some metadata there.)
(甚至有可能定义一个mixin,允许在“nt:file”节点下有更多的子节点,并在那里存储一些元数据,尽管要复杂得多。)
Mixins are fantastic and give a tremendous amount of flexibility and power to your JCR content.
Mixins 非常棒,为您的 JCR 内容提供了极大的灵活性和功能。
Oh, and when you've created all of the nodes you want, be sure to save the session:
哦,当你创建了你想要的所有节点后,一定要保存会话:
session.save();
回答by rancidfishbreath
I am a bit rusty with JCR and I have never used 2.0 but this should get you started.
我对 JCR 有点生疏,我从未使用过 2.0,但这应该能让你开始。
See this link. You'll want to open up the second comment.
请参阅此链接。你会想打开第二条评论。
You just store the file in a node and add additional metadata to the node. Here is how to store the file:
您只需将文件存储在节点中并向节点添加其他元数据。以下是存储文件的方法:
Node folder = session.getRootNode().getNode("path/to/file/uploads");
Node file = folder.addNode(fileName, "nt:file");
Node fileContent = file.addNode("jcr:content");
fileContent.setProperty("jcr:data", fileStream);
// Add other metadata
session.save();
How you store meta-data is up to you. A simple way is to just store key value pairs:
您如何存储元数据取决于您。一个简单的方法是只存储键值对:
fileContent.setProperty(key, value, PropertyType.STRING);
To read the data you just call getProperty()
.
要读取数据,您只需调用getProperty()
.
fileStream = fileContent.getProperty("jcr:data");
value = fileContent.getProperty(key);
回答by Abhishek Shah
I am new to Hymanrabbit, working on 2.4.2. As for your solution, you can check for the type using a core java logic and put cases defining any variation in your action.
我是 Hymanrabbit 的新手,正在开发 2.4.2。至于您的解决方案,您可以使用核心 java 逻辑检查类型并放置定义操作中任何变化的案例。
You won't need to worry about issues with saving contents of different .txt or .pdf as their content is converted into binary and saved. Here is a small sample in which I uploaded and downloaded a pdf file in/from Hymanrabbit repo.
您无需担心保存不同 .txt 或 .pdf 内容的问题,因为它们的内容会被转换为二进制文件并保存。这是一个小示例,我在其中上传和下载了 Hymanrabbit 存储库中的 pdf 文件。
// Import the pdf file unless already imported
// This program is for sample purpose only so everything is hard coded.
if (!root.hasNode("Alfresco_E0_Training.pdf"))
{
System.out.print("Importing PDF... ");
// Create an unstructured node under which to import the XML
//Node node = root.addNode("importxml", "nt:unstructured");
Node file = root.addNode("Alfresco_E0_Training.pdf","nt:file");
// Import the file "Alfresco_E0_Training.pdf" under the created node
FileInputStream stream = new FileInputStream("<path of file>\Alfresco_E0_Training.pdf");
Node content = file.addNode("jcr:content","nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
content.setProperty("jcr:data",binary);
stream.close();
session.save();
//System.out.println("done.");
System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
System.out.println("File Node Name : "+file.getName());
System.out.println("File Node Identifier : "+file.getIdentifier());
System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
System.out.println("Content Node Name : "+content.getName());
System.out.println("Content Node Identifier : "+content.getIdentifier());
System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
}else
{
session.save();
Node file = root.getNode("Alfresco_E0_Training.pdf");
Node content = file.getNode("jcr:content");
String path = content.getPath();
Binary bin = session.getNode(path).getProperty("jcr:data").getBinary();
InputStream stream = bin.getStream();
File f=new File("C:<path of the output file>\Alfresco_E0_Training.pdf");
OutputStream out=new FileOutputStream(f);
byte buf[]=new byte[1024];
int len;
while((len=stream.read(buf))>0)
out.write(buf,0,len);
out.close();
stream.close();
System.out.println("\nFile is created...................................");
System.out.println("done.");
System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
System.out.println("File Node Name : "+file.getName());
System.out.println("File Node Identifier : "+file.getIdentifier());
//System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
System.out.println("Content Node Name : "+content.getName());
System.out.println("Content Node Identifier : "+content.getIdentifier());
System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
}
//output the repository content
}
catch (IOException e){
System.out.println("Exception: "+e);
}
finally {
session.logout();
}
}
}
Hope this helps
希望这可以帮助