Clinical data

Those files are potentially common to all modalities

Patient-level clinical data

  • Objective of the file: for clinical info, demographics, staging, survival

  • Requirements:

  • Named clinical_data_table

  • In parquet format

  • patient_id as primary key

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

patient_id

str

TRUE

Patient identifier. There can be mutliple bk_sample_id for one patient_id.

indication

str

FALSE

Example of clinical columns currently used by K columns:

'alcohol_intake',
 'smoking_status',
 'pfs_years',
 'pfs_event',
 'cancer_stage',
 'beta_2_microglobulin_level',
 'cd10_ihc_positive',
 'largest_diameter_of_the_primary_tumor',
 'ldh_level',
 'menopause_status',
 'metastatic_status',
 'treatment_path',
 'has_received_chemotherapy',
 'chemotherapy_best_response',
 'chemotherapy_treatment_line',
 'chemotherapy_treatment_setting',

Sample level metadata: sample_metadata_table

  • Objective of the file: Contains sample-to-patient mapping, tissue type

  • Example format: sample_metadata_table.parquet

  • Requirements:

  • Named sample_metadata_table

  • In parquet format

  • sample_id as primary key

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

sample_id

str

TRUE

Sample identifier

patient_id

str

TRUE

Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.

sample_tissue

str

FALSE

Example values: 'Testis', 'Colon', 'Kidney'

histo_sample_type

str

FALSE

Example values: 'recurrent tumor', 'primary tumor', 'tumor'

sample_label

str

FALSE

Example values: 'Category_1', 'Category_2', 'Category_3', 'Category_4', 'Category_5'

Mutation metadata

  • Objective of the file: Contains variant classification for genes per patient

  • Example format: mutation_table.parquet

  • Requirements:

  • Named mutation_table

  • In parquet format

  • patient_id ; gene_nameas primary keys

  • Contains the mandatory columns in the table below

  • File columns & description:

Name
Type
Mandatory
Description

patient_id

str

TRUE

Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.

gene_name

str

TRUE

Patient identifier. Maps to the linked clinical data file. There can be mutliple bk_sample_id for one patient_id.

is_mutated

bool

TRUE

HGVSp_Short

str

?

the HGVS coding sequence name mapped as categories

HGVSc

str

?

the HGVS protein sequence name mapped as categories

Variant_Classification

str

?

Last updated