data

Goin' to Carolina in my mind (or on my hard drive)

Out-of-memory processing of North Carolina's voter file with DuckDB and Apache Arrow

Oh, I'm sure it's probably nothing

How we do (or don't) think about null values and why the polyglot push makes it all the more important

Update: grouped data quality check PR merged to dbt-utils

After a prior post on the merits of grouped data quality checks, I demo my newly merged implementation for dbt

Using databases with Shiny

Key issues when adding persistent storage to a Shiny application, featuring {golem} app development and Digital Ocean serving

Make grouping a first-class citizen in data quality checks

Which of these numbers doesn’t belong? -1, 0, 1, NA. You can't judge data quality without data context, so our tools should enable as much context as possible.

Update: column-name contracts with dbtplyr

Following up on 'Embedding Column-Name Contracts... with dbt' to demo my new dbtplyr package to further streamline the process

A lightweight data validation ecosystem with R, GitHub, and Slack

A right-sized solution to automated data monitoring, alerting, and reporting using R (`pointblank`, `projmgr`), GitHub (Actions, Pages, issues), and Slack

97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

Contributed six chapters on tops ranging from data design, development, validation, and democratization

Embedding column-name contracts in data pipelines with dbt

dbt supercharges SQL with Jinja templating, macros, and testing -- all of which can be customized to enforce controlled vocabularies and their implied contracts on a data model

Causal design patterns for data analysts

An informal primer to causal analysis designs and data structures