Loading...
Loading...
A recent Hacker News post linked to an essay arguing that columnar storage is effectively a form of normalization, sparking debate about the analogy. Commenters praised the teaching value but criticized the comparison for conflating logical database design (normalization) with physical storage formats (columnar systems). The discussion highlights key distinctions between theoretical schema decomposition and practical storage/access optimizations, and why equating them can mislead database archit
Columnar storage, the article argues, can be seen as a form of extreme normalization within the relational model: turning a wide, row-oriented table into separate column tables keyed by an implied ordinal primary key. Using simple pet/name examples, the author contrasts row-oriented layouts (fast row writes and reads) with columnar layouts (fast column scans and analytics but costlier row reconstruction and updates). Framing columnarization as normalization unifies concepts like projections and joins with storage encoding and clarifies why SQL engines can treat both representations equivalently apart from performance. The piece highlights the trade-offs and the conceptual bridge between physical storage formats and relational abstractions.
&#32; submitted by &#32; <a href="https://www.reddit.com/user/SpecialistLady"> /u/SpecialistLady </a> <br/> <span><a href="https://buttondown.com/jaffray/archive/columnar-storage-is-normalization/">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/programming/comments/1ssm5qa/columnar_storage_is_normalization/">[comments]</a></span>
The author argues that columnar storage is simply a form of extreme normalization within the relational model: representing a table as separate column tables (or arrays) keyed by an implied ordinal lets you view columnar formats as normalized tables that can be rejoined into rows. Using simple pet-name examples, the piece contrasts row-oriented (fast row writes and retrievals) with column-oriented (efficient reads for single columns, slower writes/row reconstruction) designs, and suggests that thinking of columns as 1-attribute tables with an implicit id unifies concepts like projections and joins and clarifies performance trade-offs for SQL engines.
A recent Hacker News post linked to an essay arguing that columnar storage is effectively a form of normalization, sparking debate about the analogy. Commenters praised the teaching value but criticized the comparison for conflating logical database design (normalization) with physical storage formats (columnar systems). The discussion highlights key distinctions between theoretical schema decomposition and practical storage/access optimizations, and why equating them can mislead database architects, developers, and analytics engineers. This matters because understanding whether optimizations belong at the logical or physical layer affects schema design, query planning, performance tuning, and tool choice for analytics and OLAP workloads.