A toolkit to process data files (csv and parquet) using the command line, inspired by [csvkit](https://github.com/wireservice/csvkit), with blazing speed, and powered by Rust.
A toolkit to process data files (csv and parquet) using the command line, inspired by [csvkit](https://github.com/wireservice/csvkit), with blazing speed, and powered by Rust.
You may wonder why I'm implementing this, since there's already [xsv](https://github.com/BurntSushi/xsv). There are two reasons for that:
@ -39,6 +39,50 @@ shape: (3, 2)
└──────┴───────────┘
```
## Howto
The `dr` command offers a set of subcommands, each one of them with a different functionality. You can get the available subcommands with:
```bash
$ dr --help
Command-line data file processing in Rust
Usage: dr [COMMAND]
Commands:
sql Runs a sql statement on the file
print Pretty prints the table
rpq Read parquet file
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help information
-V, --version Print version information
```
Subcommands can be pipelined unless reading from a file, writing to a file, or pretty prints data. What goes through the pipeline is a plain-text comma separated values with a header. While this may not be the best choice in terms of performance, allows `dr` subcommands to be combined with the usual unix-style command-line tools like `cat`, `head`, `grep`, `awk` and `sed`:
Note that when `dr` loads csv data also tries to guess the data type of each field.
## Performance
`dr` is implemented in Rust with the goal of achieving the highest possible performance. Take for instance a simple read, groupby, and aggregate operation with ~30MB of data: