Changeset be1a361


Ignore:
Timestamp:
26/04/2012 09:48:43 (7 years ago)
Author:
Eric van der Vlist <vdv@dyomedea.com>
Branches:
master
Children:
6f64c7f
Parents:
307b6d2
git-author:
Eric van der Vlist <vdv@dyomedea.com> (26/04/2012 09:48:43)
git-committer:
Eric van der Vlist <vdv@dyomedea.com> (26/04/2012 09:48:43)
Message:

Implementing yet another WARC parser (the heritrix one didn't work well with Orbeon due to http client library conflicts).

Location:
archiver
Files:
10 added
1 edited

Legend:

Unmodified
Added
Removed
  • archiver/pipelines/actions/package-heritrix-warc.xpl

    r22c3028 rbe1a361  
    88<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline" xmlns:oxf="http://www.orbeon.com/oxf/processors" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xforms="http://www.w3.org/2002/xforms" 
    99    xmlns:xxforms="http://orbeon.org/oxf/xml/xforms" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:exist="http://exist.sourceforge.net/NS/exist" xmlns:saxon="http://saxon.sf.net/" 
    10     xmlns:pipeline="java:org.orbeon.oxf.processor.pipeline.PipelineFunctionLibrary"> 
     10    xmlns:pipeline="java:org.orbeon.oxf.processor.pipeline.PipelineFunctionLibrary" xmlns:owk="http://owark.org/orbeon/processors"> 
    1111 
    1212    <p:param name="data" type="input"/> 
     
    3333        </p:input> 
    3434        <p:output name="data" id="warc"/> 
     35    </p:processor> 
     36 
     37<p:processor name="owk:from-warc-converter"> 
     38<p:input name="data" href="#warc"/> 
     39<p:output name="data" id="warc-xml" debug="warc-xml"/> 
     40</p:processor> 
     41 
     42    <p:processor name="oxf:null-serializer"> 
     43        <p:input name="data" href="#warc-xml"/> 
    3544    </p:processor> 
    3645     
Note: See TracChangeset for help on using the changeset viewer.