Clinical data
Those files are potentially common to all modalities
Patient-level clinical data
Objective of the file: for clinical info, demographics, staging, survival
Requirements:
Named clinical_data_table
In parquet format
patient_id as primary key
Contains the mandatory columns in the table below
File columns & description:
patient_id
str
TRUE
Patient identifier. There can be mutliple bk_sample_id for one patient_id.
indication
str
FALSE
Example of clinical columns currently used by K columns:
'alcohol_intake',
'smoking_status',
'pfs_years',
'pfs_event',
'cancer_stage',
'beta_2_microglobulin_level',
'cd10_ihc_positive',
'largest_diameter_of_the_primary_tumor',
'ldh_level',
'menopause_status',
'metastatic_status',
'treatment_path',
'has_received_chemotherapy',
'chemotherapy_best_response',
'chemotherapy_treatment_line',
'chemotherapy_treatment_setting',Sample level metadata: sample_metadata_table
Objective of the file: Contains sample-to-patient mapping, tissue type
Example format: sample_metadata_table.parquet
Requirements:
Named sample_metadata_table
In parquet format
sample_id as primary key
Contains the mandatory columns in the table below
File columns & description:
sample_id
str
TRUE
Sample identifier
patient_id
str
TRUE
Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.
sample_tissue
str
FALSE
Example values: 'Testis', 'Colon', 'Kidney'
histo_sample_type
str
FALSE
Example values: 'recurrent tumor', 'primary tumor', 'tumor'
sample_label
str
FALSE
Example values: 'Category_1', 'Category_2', 'Category_3', 'Category_4', 'Category_5'
Mutation metadata
Objective of the file: Contains variant classification for genes per patient
Example format: mutation_table.parquet
Requirements:
Named mutation_table
In parquet format
patient_id ; gene_nameas primary keys
Contains the mandatory columns in the table below
File columns & description:
patient_id
str
TRUE
Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.
gene_name
str
TRUE
Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.
is_mutated
bool
TRUE
HGVSp_Short
str
?
the HGVS coding sequence name mapped as categories
HGVSc
str
?
the HGVS protein sequence name mapped as categories
Variant_Classification
str
?
Last updated