File Data Model

A File represents a single data unit within a dataset, such as a document, spreadsheet, or image. In research and data management, files are essential for storing and organizing raw or processed data. They provide the foundation for data analysis, sharing, and compliance with storage policies. Properly formatted file entries ensure data can be retrieved, validated, and referenced consistently across projects.

The attributes in this model describe key metadata about each file, including its format, assay type, species, and associated dataset. These details help maintain file traceability and usability within data repositories.

Why You Should Contribute File Entries¶

Contributing file entries ensures that critical data files are accessible, organized, and easily retrievable. By documenting your files with key metadata, you enhance their usability in research workflows and support data sharing and collaboration. Accurate and detailed file entries also help prevent data loss, streamline future analyses, and facilitate compliance with research data management policies.

Who Should Be Contributing File Entries?¶

Researchers – Share and organize key datasets and processed results to improve reproducibility and collaboration across projects.
Data Managers – Maintain an organized structure for large-scale data repositories by documenting file attributes like formats, assays, and species.
Project Leads – Ensure data generated by your research projects is categorized and described, making it easier for teams to access and reuse.
Bioinformaticians and Data Analysts – Provide detailed metadata to streamline data integration, analysis pipelines, and compatibility with downstream tools.
Collaborative Consortia Members – Contribute shared resources to foster data transparency, enabling broader collaboration and multi-institution research efforts.

Download Template¶

Use the file entry template to streamline your data entry process. The template contains pre-defined required fields.

Example Data Entry¶

The table below includes sample values to demonstrate proper attribute usage.

Example Data Entry (Biology-Focused)¶

Attribute	Example Value
File Description	CSV file containing gene expression data for breast cancer samples
File Design	CSV (Comma-separated values)
File Url	https://www.example.com/files/breast_cancer_expression_data.csv
File Assay	RNA Sequencing
File Level	Level 3: Processed summary data, like gene expression counts or coverage statistics (e.g., CSV files)
File Species	Human
File Tumor Type	Breast Carcinoma
File Tissue	Breast
File View	List View
FileView_id	FileView_789012
File Format	CSV
File Alias	Breast_Cancer_Gene_Expression.csv

Full Field Reference¶

Below is the full field reference table with attributes and their descriptions.

Attribute	Description	Required	Validation Rules	Examples
File Description	Description of the file.	False	str	"CSV file containing gene expression data for breast cancer"
File Design	The overall design of the dataset or file.	False	str	"Gene expression values derived from RNA sequencing of tumor and normal tissue samples, processed using STAR aligner and featureCounts. Samples collected from 100 patients at diagnosis."
File Url	The url of where the file is stored.	True	url	https://www.example.com/files/breast_cancer_expression_data.csv
File Assay	The assay(s) the file is representative of. Multiple values permitted, comma separated.	True	list like	RNA Sequencing
File Level	The processing level the file can be mapped to.	True	None	Level 3
File Species	The species the data was collected on. Multiple values permitted, comma separated.	True	list like	Human
File Tumor Type	The tumor type(s), if applicable, of the data collected. Multiple values permitted, comma separated.	False	list like	Breast Carcinoma
File Tissue	Tissue type(s) associated with the file. Multiple values permitted, comma separated.	False	list like	Breast
File View	The denormalized manifest for file submission.	False	None	List View
FileView_id	A unique primary key that enables record updates using schematic.	True	unique	"SynapseID_123456"
File Longitudinal Group	A label that can be used to identify groups of files from the same longitudinal/time-resolved experiment	False	str	Patient Cohort A - Baseline
File Longitudinal Event Type	The type of event to which File Longitudinal Total Time Elapsed is related	False	str	Baseline
File Longitudinal Sequence Identifier	The order in which this file was collected with respect to the longitudinal experiment (e.g., 1, 2, etc.). Integer.	False	int	10-9876543210-12
File Longitudinal Time Elapsed Unit	The unit of time associated with Sequential and Total Time Elapsed attributes.	False	str	Seconds
File Longitudinal Sequential Time Elapsed	The time elapsed between collecting the current and previous files in this longitudinal group.	False	num	"9900s"
File Longitudinal Total Time Elapsed	The total time elapsed between the first and current files contained this longitudinal group.	False	num	"990000s"
File Format	The format of the file described by this entry.	True	None	CSV
File Alias	A string identifier associated with the file. Must be unique. Can be the repository accesssion number (e.g., Synapse ID, GEO identifier such as GSE12345). No Greek Letters or DOIs.	True	unique	"SynapseID_123456"
File Anatomic Site	The anatomic site associated with the data contained in this file.	True	list like	Brain stem, Cervix uteri