Skip to content

Add --select, --rename, and --cast flags for load-time column transformation #172

@vmvarela

Description

@vmvarela

Description

Add flags to select, rename, and cast columns during loading, before data reaches SQLite.

# Select specific columns only
sql-pipe data.csv --select 'name,amount' 'SELECT * FROM t'

# Rename columns during load
sql-pipe data.csv --rename 'old_name:new_name,amount:revenue' 'SELECT new_name, revenue FROM t'

# Cast column types (override inference)
sql-pipe data.csv --cast 'zip:TEXT,amount:REAL' 'SELECT * FROM t'

# Combine all three
sql-pipe data.csv --select 'name,zip,amount' --rename 'amount:revenue' --cast 'zip:TEXT' 'SELECT * FROM t'

Motivation

Type inference is imperfect — ZIP codes like "01234" get detected as INTEGER (losing the leading zero). Column renaming is needed when source data has spaces, special characters, or non-SQL-friendly names. Currently users must use CAST() in every SQL query, which is verbose and must be repeated.

Acceptance Criteria

  • --select <cols> flag: load only specified columns (comma-separated)
  • --rename <mapping> flag: rename columns during load (old:new pairs, comma-separated)
  • --cast <mapping> flag: override inferred types (col:TYPE pairs, comma-separated)
  • Flags can be combined in a single invocation
  • --select preserves column order as specified
  • --cast supports: INTEGER, REAL, TEXT, DATE
  • Error on unknown column names in --rename and --cast
  • Integration tests for each flag and combinations
  • Help text updated

Implementation Notes

  • Intervene between header parsing and table creation in the loader pipeline
  • --select: filter column names before type inference and INSERT binding
  • --rename: apply name mapping after header parsing
  • --cast: override type inference results before CREATE TABLE
  • The PRAGMA table_info approach (already in getTableColumns) provides column list for validation
  • Multi-character delimiters make index-based column skipping in the CSV parser more complex — consider post-filtering the record array instead

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:mediumShould be done soonsize:mMedium — 4 to 8 hoursstatus:readyRefined and ready for sprint selectiontype:featureNew functionality

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions