Compress MS (compressms)#

visco.compress_ms.write_ms_to_zarr(ms_path: str, zarr_path: str, consolidated: bool, chunk_size_row: int, overwrite: bool, compressor: str, level: int)#

Convert a Measurement Set to a Zarr store.

Parameters:
  • ms_path (str) – The path to the Measurement Set.

  • zarr_path (str) – The path to the Zarr store.

  • consolidated (bool, optional) – Whether to use a consolidated Zarr store.

  • chunk_size_row (int, optional) – The chunk size for the rows.

  • overwrite (bool, optional) – Whether to overwrite the Zarr store if it exists.

  • compressor (str, optional) – The name of the compressor to use.

  • level (int, optional) – The compression level to use.

visco.compress_ms.compress_visdata(zarr_output_path: str, compressor: str, level: int, correlation: str, correlation_optimized: bool, fieldid: int, ddid: int, scan: int, column: str, outcolumn: str, batch_size: int, flag_estimate: bool, use_model_data: bool, model_data: Optional[Array] = None, decorrelation: Optional[float] = None, compressionrank: Optional[int] = None, flagvalue: Optional[int] = None, antennas: Optional[list] = None)#

Compress visibility data using SVD with batched processing.

Parameters:
  • zarr_output_path (str) – Path to the Zarr store.

  • compressor (str) – Name of the compressor to use.

  • level (int) – Compression level to use.

  • correlation (str) – Comma-separated list of correlation types to process (e.g., ‘XX,YY,XY,YX’).

  • correlation_optimized (bool) – Whether to use optimized correlation processing (XX/YY and XY/YX together).

  • fieldid (int) – FIELD_ID to filter on.

  • ddid (int) – DATA_DESC_ID to filter on.

  • scan (int) – SCAN_NUMBER to filter on.

  • column (str) – Column in the MAIN table containing the visibility data to compress.

  • outcolumn (str) – Column name to store the compressed data.

  • batch_size (int) – Number of baselines to process in each batch.

  • flag_estimate (bool) – Whether to estimate flagged data using interpolation.

  • use_model_data (bool) – Whether to replace flagged data with model data.

  • model_data (str, optional) – Column name for model data if use_model_data is True.

  • decorrelation (float, optional) – Desired decorrelation level (0 to 1).

  • compressionrank (int, optional) – Number of singular values to keep.

  • flagvalue (int, optional) – Value to replace flagged data with if specified.

  • antennas (list, optional) – List of antenna names to restrict processing to specific baselines.

visco.compress_ms.compress_full_ms(ms_path: str, zarr_path: str, consolidated: bool, chunk_size_row: int, overwrite: bool, compressor: str, level: int, nworkers: int, nthreads: int, memory_limit: str, direct_to_workers: bool, correlation: str, correlation_optimized: bool, fieldid: int, ddid: int, scan: int, column: str, outcolumn: str, batch_size: int, dashboard_addr: Optional[str] = None, host_addr: Optional[str] = None, use_model_data: bool = False, model_data: Optional[str] = None, flag_estimate: bool = False, decorrelation: Optional[float] = None, compressionrank: Optional[int] = None, flagvalue: Optional[int] = None, antennas: Optional[list] = None)#

Compress a Measurement Set using SVD with batched processing.

Parameters:
  • ms_path (str) – Path to the Measurement Set.

  • zarr_path (str) – Path to the Zarr store.

  • consolidated (bool) – Whether to use a consolidated Zarr store.

  • chunk_size_row (int) – Chunk size for the rows.

  • overwrite (bool) – Whether to overwrite the Zarr store if it exists.

  • compressor (str) – Name of the compressor to use.

  • level (int) – Compression level to use.

  • nworkers (int) – Number of Dask workers.

  • nthreads (int) – Number of threads per worker.

  • memory_limit (str) – Memory limit per worker (e.g., ‘4GB’).

  • direct_to_workers (bool) – Whether to send tasks directly to workers.

  • correlation (str) – Comma-separated list of correlation types to process (e.g., ‘XX,YY,XY,YX’).

  • correlation_optimized (bool) – Whether to use optimized correlation processing (XX/YY and XY/YX together).

  • fieldid (int) – FIELD_ID to filter on.

  • ddid (int) – DATA_DESC_ID to filter on.

  • scan (int) – SCAN_NUMBER to filter on.

  • column (str) – Column in the MAIN table containing the visibility data to compress.

  • outcolumn (str) – Column name to store the compressed data.

  • batch_size (int) – Number of baselines to process in each batch.

  • dashboard_addr (str, optional) – Address for the Dask dashboard.

  • host_addr (str, optional) – Host address for the Dask scheduler.

  • use_model_data (bool, optional) – Whether to replace flagged data with model data.

  • model_data (str, optional) – Column name for model data if use_model_data is True.

  • flag_estimate (bool, optional) – Whether to estimate flagged data using interpolation.

  • decorrelation (float, optional) – Desired decorrelation level (0 to 1).

  • compressionrank (int, optional) – Number of singular values to keep.

  • flagvalue (int, optional) – Value to replace flagged data with if specified.

  • antennas (list, optional) – List of antenna names to restrict processing to specific baselines.