Finding and deleting duplicates: Revision history

Jump to navigation Jump to search

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

14 January 2025

17 December 2024

  • curprev 03:4903:49, 17 December 2024 Solomon.pidoke talk contribs 3,651 bytes −52 No edit summary undo
  • curprev 03:4803:48, 17 December 2024 Solomon.pidoke talk contribs 3,703 bytes +3,703 Created page with "==Background== Indici shares delta data on a daily basis in parquet format. This contains a large number of apparent duplicates. While all incoming data is just stashed as-is in the <code>indici_staging</code> schema, a separate process is required to deduplicate so it can be reliably used for reporting. This page is meant to detail exactly how that process should be built/executed. ==Deduplicating staging data== ===Catching up already stored data in staging sch..."