Merge branch 'main' of https://git.guillemborrell.es/guillem/dr
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/tag/woodpecker Pipeline was successful

This commit is contained in:
Guillem Borrell 2023-01-18 09:50:30 +00:00
commit cbf318690c
2 changed files with 39 additions and 1 deletions

View file

@ -1,7 +1,7 @@
[package] [package]
name = "dr" name = "dr"
description = "Command-line data file processing in Rust" description = "Command-line data file processing in Rust"
version = "0.6.0" version = "0.6.1"
edition = "2021" edition = "2021"
include = [ include = [
"**/*.rs", "**/*.rs",

View file

@ -111,6 +111,44 @@ $ dr rpq data/yellow_tripdata_2014-01.parquet \
└─────────┴─────────────────┘ └─────────┴─────────────────┘
``` ```
### Operate with SQL databases
How many times did you have to insert a csv file (sometimes larger than memory) to a database? Tens of times? Hundreds? You've probably used Pandas for that, since it can infer the table's datatypes. So a simple data operation becomes a python script with Pandas and a driver for PostgreSQL as dependencies.
Now dr can provide the table creation statement with a handful of columns:
```
$ head wine.csv | dr schema -i -p -n wine
CREATE TABLE IF NOT EXISTS "wine" ( );
ALTER TABLE "wine" ADD COLUMN "Wine" integer;
ALTER TABLE "wine" ADD COLUMN "Alcohol" real;
ALTER TABLE "wine" ADD COLUMN "Malic.acid" real;
ALTER TABLE "wine" ADD COLUMN "Ash" real;
ALTER TABLE "wine" ADD COLUMN "Acl" real;
ALTER TABLE "wine" ADD COLUMN "Mg" integer;
ALTER TABLE "wine" ADD COLUMN "Phenols" real;
ALTER TABLE "wine" ADD COLUMN "Flavanoids" real;
ALTER TABLE "wine" ADD COLUMN "Nonflavanoid.phenols" real;
ALTER TABLE "wine" ADD COLUMN "Proanth" real;
ALTER TABLE "wine" ADD COLUMN "Color.int" real;
ALTER TABLE "wine" ADD COLUMN "Hue" real;
ALTER TABLE "wine" ADD COLUMN "OD" real;
ALTER TABLE "wine" ADD COLUMN "Proline" integer;
```
If you're fine with dr's choices you can then create the table and insert the file
```
$ head wine.csv | dr schema -i -p -n wine | psql
$ tail -n +2 wine.csv | psql -c "\copy wine from stdin with (FORMAT 'csv')"
```
Since most databases can ingest and spit CSV files, some simple operations can be enhanced with dr, like storing the results of a query in a parquet file
```
$ psql -c "copy (select * from wine) to stdout with (FORMAT 'csv', HEADER)" | dr csv -i -P wine.pq
```
## Reference ## Reference
Some commands that generate raw output in ipc format. Some commands that generate raw output in ipc format.