Single cell RNA seq
Single-cell metadata file: scrnaseq_metadata_table
Objective of the file: single-cell metadata file that contains cell-level metadata, cell type annotations
Example format: scrnaseq_metadata_table.parquet
Requirements:
Named scrnaseq_metadata_table
In parquet format
cell_id as primary key
Contains the mandatory columns in the table below
File columns & description:
cell_id
str
TRUE
Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.
patient_id
str
TRUE
Gene name fitting HGNC ontology
sample_id
str
FALSE
TPM count value
cell_type_level_1_major
str
TRUE
Major cell type classification
cell_type_level_2_mid
str
TRUE
Mid-level cell type classification
cell_type_level_3_granular
str
TRUE
Granular cell type classification
UMAP_1
FALSE
UMAP_2
FALSE
Single-cell counts file: scrnaseq_count_table
Objective of the file: contains raw counts, log-normalized
Example format: scrnaseq_count_table.parquet
Requirements:
Named scrnaseq_count_table
In parquet format
cell_id, gene_name as primary keys
Contains the mandatory columns in the table below
File columns & description:
cell_id
str
TRUE
Cell identifier
gene_name
str
TRUE
Gene name fitting HGNC ontology
count
pl.Float32
TRUE
Raw count value
count_lognorm
pl.Float32
TRUE
Log-normalized count value
Single-cell sample file: scrnaseq_sample_level_table
Objective of the file: contains cell type-specific metrics
Example format: scrnaseq_sample_level_table.parquet
Requirements:
Named scrnaseq_sample_level_table
In parquet format
sample_id, cell_type_level_2_mid, gene_name as primary keys
Contains the mandatory columns in the table below
File columns & description:
sample_id
str
TRUE
Cell identifier
cell_type_level_2_mid
str
TRUE
Mid-level cell type classification
gene_name
str
TRUE
Gene name fitting HGNC ontology
percentage_expressing
pl.Float32
TRUE
Percentage of cells expressing the gene
avg_log_normalized_expression_expressing
pl.Float32
TRUE
Average log expression in expressing cells
avg_log_normalized_expression_all
pl.Float32
TRUE
Average log expression in all cells
total_cells
pl.Int32
TRUE
Total number of cells
Last updated