Single cell RNA seq

Single-cell metadata file: scrnaseq_metadata_table

  • Objective of the file: single-cell metadata file that contains cell-level metadata, cell type annotations

  • Example format: scrnaseq_metadata_table.parquet

  • Requirements:

  • Named scrnaseq_metadata_table

  • In parquet format

  • cell_id as primary key

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

cell_id

str

TRUE

Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.

patient_id

str

TRUE

Gene name fitting HGNC ontology

sample_id

str

FALSE

TPM count value

cell_type_level_1_major

str

TRUE

Major cell type classification

cell_type_level_2_mid

str

TRUE

Mid-level cell type classification

cell_type_level_3_granular

str

TRUE

Granular cell type classification

UMAP_1

FALSE

UMAP_2

FALSE

Single-cell counts file: scrnaseq_count_table

  • Objective of the file: contains raw counts, log-normalized

  • Example format: scrnaseq_count_table.parquet

  • Requirements:

  • Named scrnaseq_count_table

  • In parquet format

  • cell_id, gene_name as primary keys

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

cell_id

str

TRUE

Cell identifier

gene_name

str

TRUE

Gene name fitting HGNC ontology

count

pl.Float32

TRUE

Raw count value

count_lognorm

pl.Float32

TRUE

Log-normalized count value

Single-cell sample file: scrnaseq_sample_level_table

  • Objective of the file: contains cell type-specific metrics

  • Example format: scrnaseq_sample_level_table.parquet

  • Requirements:

  • Named scrnaseq_sample_level_table

  • In parquet format

  • sample_id, cell_type_level_2_mid, gene_name as primary keys

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

sample_id

str

TRUE

Cell identifier

cell_type_level_2_mid

str

TRUE

Mid-level cell type classification

gene_name

str

TRUE

Gene name fitting HGNC ontology

percentage_expressing

pl.Float32

TRUE

Percentage of cells expressing the gene

avg_log_normalized_expression_expressing

pl.Float32

TRUE

Average log expression in expressing cells

avg_log_normalized_expression_all

pl.Float32

TRUE

Average log expression in all cells

total_cells

pl.Int32

TRUE

Total number of cells

Last updated