Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: csv support lazy quotes #256

Merged
merged 1 commit into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ files:
withHeader: false
withLabel: false
delimiter: ","
lazyQuotes: false
```

#### CSV data files
Expand All @@ -172,6 +173,7 @@ One CSV file can only store one type of vertex or edge. Vertices and edges of th
* `withHeader`: The default value is false. The format of the header is described in the following section.
* `withLabel`: The default value is false. The format of the label is described in the following section.
* `delimiter`: **Optional**. Specify the delimiter for the CSV files. The default value is `","`. And only a 1-character string delimiter is supported.
* `lazyQuotes`: **Optional**. If `lazyQuotes` is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.
yixinglu marked this conversation as resolved.
Show resolved Hide resolved

#### `schema`

Expand Down
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ files:
withHeader: false
withLabel: false
delimiter: ","
lazyQuotes: false
```

#### 数据文件
Expand All @@ -146,6 +147,7 @@ files:
- `withHeader`:默认是 `false`,文件头的格式在后面描述。
- `withLabel`:默认是 `false`,label 的格式也在后面描述。
- `delimiter`:**可选**。指定 CSV 文件的分隔符,默认是 `","`。目前只有单字符的分隔符被支持。
- `lazyQuotes`:**可选**。如果 `lazyQuotes` 设置为 `true`,引号可能出现在不带引号的字段中,非双引号可能出现在带引号的字段中。

#### `schema`

Expand Down
7 changes: 7 additions & 0 deletions examples/v2/course-lazy-quotes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
00testLazyQuotes0,a "word",4,No1
00testLazyQuotes1,a"1"2",4,No1
00testLazyQuotes2,a",4,No1
00testLazyQuotes3,a"b,4,No1
00testLazyQuotes4,a"b,4,No1
00testLazyQuotes5,a""b,4,No1
00testLazyQuotes6,"a"b",4,No1
24 changes: 24 additions & 0 deletions examples/v2/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,30 @@ files:
- name: name
type: string

- path: ./course-lazy-quotes.csv
failDataPath: ./err/course-lazy-quotes
batchSize: 2
inOrder: true
type: csv
csv:
withHeader: false
withLabel: false
lazyQuotes: true
schema:
type: vertex
vertex:
tags:
- name: course
props:
- name: name
type: string
- name: credits
type: int
- name: building
props:
- name: name
type: string

- path: ./course.csv
failDataPath: ./err/course-concat
batchSize: 2
Expand Down
1 change: 1 addition & 0 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ type CSVConfig struct {
WithHeader *bool `json:"withHeader" yaml:"withHeader"`
WithLabel *bool `json:"withLabel" yaml:"withLabel"`
Delimiter *string `json:"delimiter" yaml:"delimiter"`
LazyQuotes *bool `json:"lazyQuotes" yaml:"lazyQuotes"`
}

type File struct {
Expand Down
3 changes: 3 additions & 0 deletions pkg/csv/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ func (r *CSVReader) InitReader(file *os.File, runnerLogger *logger.RunnerLogger)
logger.Log.Infof("The delimiter of %s is %#U", file.Name(), r.reader.Comma)
}
}
if r.CSVConfig.LazyQuotes != nil {
r.reader.LazyQuotes = *r.CSVConfig.LazyQuotes
}
stat, err := file.Stat()
if err != nil {
logger.Log.Infof("The stat of %s is wrong, %s", file.Name(), err)
Expand Down