Skip to content

Latest commit

 

History

History
106 lines (82 loc) · 2.43 KB

README.md

File metadata and controls

106 lines (82 loc) · 2.43 KB

XML parser plugin for Embulk

Parser plugin for Embulk.

Read data from input as xml and fetch each entries to output.

Overview

  • Plugin type: parser
  • Load all or nothing: yes
  • Resume supported: no

Types

  • xml: Find rows by SAX.
  • xpath: Find finds rows by Xpath, so you can process XML by more complex condition than xml type.

Configuration

XML

parser:
  type: xml
  root: data/students/student
  schema:
    - {name: name, type: string}
    - {name: age, type: long}
  • type: specify this plugin as xml .
  • root: root property to start fetching each entries, specify in path/to/node style, required.
  • schema: specify the attribute of table and data type, required.

If you need to parse column as timestamp type, schema supports 2 optional parameters:

schema:
  - {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
  • format: timestamp format to parse, required.
  • timezone: timestamp will be parsing in this timezone, "+0900" is used by default.

Xpath

parser:
  type: xpath
  root: //data/students/student
  schema:
    - {path: name, type: string, name: name}
    - {path: age, type: long, name: age}
    - {path: hobbies/hobby, type: json, name: hobbies}
  • type: specify this plugin as xpath .
  • root: root property to start fetching each entries, specify in Xpath, '/'' is used by default.
  • schema: specify the attribute of table and data type, required.
  • namespaces: xml namespaces

If you need to parse column as timestamp type, schema supports 2 optional parameters:

schema:
  - {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
  • format: timestamp format to parse, required.
  • timezone: timestamp will be parsing in this timezone, "+0900" is used by default.

Here is XML for xample:

<data>
  <result>true</result>
  <students>
    <student>
      <name>John</name>
      <age>10</age>
      <hobbies>
        <hobby>music</hobby>
        <hobby>movie</hobby>
      </hobbies>
    </student>
    <student>
      <name>Paul</name>
      <age>16</age>
      <hobbies>
        <hobby>game</hobby>
      </hobbies>
    </student>
    <student>
      <name>George</name>
      <age>17</age>
    </student>
    <student>
      <name>Ringo</name>
      <age>18</age>
    </student>
  </students>
</data>