Improved documentation
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
This commit is contained in:
parent
9f7b3605f4
commit
c76150948d
6
.gitignore
vendored
6
.gitignore
vendored
|
@ -19,8 +19,12 @@ Cargo.lock
|
|||
# Added by cargo
|
||||
|
||||
/target
|
||||
|
||||
|
||||
.vscode
|
||||
.ipynb_checkpoints
|
||||
|
||||
/data
|
||||
|
||||
/vendor
|
||||
|
||||
.cargo
|
48
README.md
48
README.md
|
@ -2,7 +2,7 @@
|
|||
|
||||
[![status-badge](https://ci.guillemborrell.es/api/badges/guillem/dr/status.svg)](https://ci.guillemborrell.es/guillem/dr) | [Download](https://git.guillemborrell.es/guillem/-/packages/generic/dr)
|
||||
|
||||
A toolkit to process data files (csv and parquet) using the command line, inspired by [csvkit](https://github.com/wireservice/csvkit), with blazing speed, and powered by Rust.
|
||||
A toolkit to process data files (csv and parquet) using the command line, inspired by [csvkit](https://github.com/wireservice/csvkit), with blazing speed, and powered by Rust.
|
||||
|
||||
You may wonder why I'm implementing this, since there's already [xsv](https://github.com/BurntSushi/xsv). There are two reasons for that:
|
||||
|
||||
|
@ -39,6 +39,50 @@ shape: (3, 2)
|
|||
└──────┴───────────┘
|
||||
```
|
||||
|
||||
## Howto
|
||||
|
||||
The `dr` command offers a set of subcommands, each one of them with a different functionality. You can get the available subcommands with:
|
||||
|
||||
```bash
|
||||
$ dr --help
|
||||
Command-line data file processing in Rust
|
||||
|
||||
Usage: dr [COMMAND]
|
||||
|
||||
Commands:
|
||||
sql Runs a sql statement on the file
|
||||
print Pretty prints the table
|
||||
rpq Read parquet file
|
||||
help Print this message or the help of the given subcommand(s)
|
||||
|
||||
Options:
|
||||
-h, --help Print help information
|
||||
-V, --version Print version information
|
||||
```
|
||||
|
||||
Subcommands can be pipelined unless reading from a file, writing to a file, or pretty prints data. What goes through the pipeline is a plain-text comma separated values with a header. While this may not be the best choice in terms of performance, allows `dr` subcommands to be combined with the usual unix-style command-line tools like `cat`, `head`, `grep`, `awk` and `sed`:
|
||||
|
||||
```bash
|
||||
$ cat wine.csv | head -n 5 | dr print
|
||||
shape: (4, 14)
|
||||
┌──────┬─────────┬────────────┬──────┬─────┬───────────┬──────┬──────┬─────────┐
|
||||
│ Wine ┆ Alcohol ┆ Malic.acid ┆ Ash ┆ ... ┆ Color.int ┆ Hue ┆ OD ┆ Proline │
|
||||
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
|
||||
│ i64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i64 │
|
||||
╞══════╪═════════╪════════════╪══════╪═════╪═══════════╪══════╪══════╪═════════╡
|
||||
│ 1 ┆ 14.23 ┆ 1.71 ┆ 2.43 ┆ ... ┆ 5.64 ┆ 1.04 ┆ 3.92 ┆ 1065 │
|
||||
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
|
||||
│ 1 ┆ 13.2 ┆ 1.78 ┆ 2.14 ┆ ... ┆ 4.38 ┆ 1.05 ┆ 3.4 ┆ 1050 │
|
||||
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
|
||||
│ 1 ┆ 13.16 ┆ 2.36 ┆ 2.67 ┆ ... ┆ 5.68 ┆ 1.03 ┆ 3.17 ┆ 1185 │
|
||||
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
|
||||
│ 1 ┆ 14.37 ┆ 1.95 ┆ 2.5 ┆ ... ┆ 7.8 ┆ 0.86 ┆ 3.45 ┆ 1480 │
|
||||
└──────┴─────────┴────────────┴──────┴─────┴───────────┴──────┴──────┴─────────┘
|
||||
```
|
||||
|
||||
Note that when `dr` loads csv data also tries to guess the data type of each field.
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
`dr` is implemented in Rust with the goal of achieving the highest possible performance. Take for instance a simple read, groupby, and aggregate operation with ~30MB of data:
|
||||
|
@ -88,7 +132,7 @@ print(df.groupby("Dept", sort=False, as_index=False).Weekly_Sales.mean())
|
|||
```
|
||||
|
||||
```bash
|
||||
$ time cat data/walmart_train.csv | ./python/group.py
|
||||
$ time cat data/walmart_train.csv | ./python/group.py
|
||||
Dept Weekly_Sales
|
||||
0 1 19213.485088
|
||||
1 2 43607.020113
|
||||
|
|
Loading…
Reference in a new issue