proposal for generic parser interface

Currently, the library supports CSV and TSV data, while external spreadsheet services (Google Sheets and Excel) deliver JSON data. In order to parse the JSON data several `TransformStream`s are used to decode the raw bytes to text, transform the JSON formats to CSV, and encode the text back to raw bytes. Finally, the transformed data are passed to the parser, which handles the data using the same CSV/TSV parser logic. This is obviously no sophisticated approach and could be improved.

### Proposal

All logic that is related to parsing a specific format could be abstracted in a generic `Parser` interface. I think this would affect four functions that have no relation at the moment: `parse`, `splitLine`, `splitLines`, and maybe `parseLine`.

```ts
interface Parser {
    parse(chunks: ArrayBufferLike[], start: Position, end: Position): string[];
    parseLine(line: string[], types: DataType[]): unknown[];
    splitLine(line: string, delimiter: string): string[];
    splitLines(chunk: string, lines: string[], remainder = ''): string;
}
```

This would allow defining different parsers for different data sources: `CsvParser implements Parser`, `JsonParser implements Parser`, etc.
Additionally, we could expose the `Parser` interface, so users could define their own parser for tabular data with a custom format, e.g. parsing a table from a Markdown file. A possible usage could be:

```ts
import { Parser } from "@lukaswagner/csv-parser";

class MyCustomParser implements Parser {
   // implementations
}

// ...

await parser.open("[custom-file]", { parser: myCustomParser });
```

The custom parser passed could be an instance or a class. If the parser has no state, the methods could also be defined as `static`. But I'm open for opinions which API would be best.

### Considerations

While this abstraction would enable using this library in more use cases, it would have more responsibilities than the name "CSV parser" suggests. It might be worth thinking about splitting the library into multiple packages, e.g. as main library `data-parser` and `csv-parser`, `json-parser`, etc. that could be integrated like plugins. Since we're already using a monorepo, introducing more packages should not be a big deal. A bigger problem would be the `csv-parser` library that currently contains the actual library and would become a plugin - so communicating that switching the library would be required. But maybe you have other thoughts/better ideas how to handle this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal for generic parser interface #41

Proposal

Considerations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

proposal for generic parser interface #41

Description

Proposal

Considerations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions