File Data Model
A File represents a single data unit within a dataset, such as a document, spreadsheet, or image. In research and data management, files are essential for storing and organizing raw or processed data. They provide the foundation for data analysis, sharing, and compliance with storage policies. Properly formatted file entries ensure data can be retrieved, validated, and referenced consistently across projects.
The attributes in this model describe key metadata about each file, including its format, assay type, species, and associated dataset. These details help maintain file traceability and usability within data repositories.
Why You Should Contribute File Entries¶
Contributing file entries ensures that critical data files are accessible, organized, and easily retrievable. By documenting your files with key metadata, you enhance their usability in research workflows and support data sharing and collaboration. Accurate and detailed file entries also help prevent data loss, streamline future analyses, and facilitate compliance with research data management policies.
Who Should Be Contributing File Entries?¶
- Researchers – Share and organize key datasets and processed results to improve reproducibility and collaboration across projects.
- Data Managers – Maintain an organized structure for large-scale data repositories by documenting file attributes like formats, assays, and species.
- Project Leads – Ensure data generated by your research projects is categorized and described, making it easier for teams to access and reuse.
- Bioinformaticians and Data Analysts – Provide detailed metadata to streamline data integration, analysis pipelines, and compatibility with downstream tools.
- Collaborative Consortia Members – Contribute shared resources to foster data transparency, enabling broader collaboration and multi-institution research efforts.
Download Template¶
Use the file entry template to streamline your data entry process. The template contains pre-defined required fields.
Example Data Entry¶
The table below includes sample values to demonstrate proper attribute usage.
Example Data Entry (Biology-Focused)¶
Attribute | Example Value |
---|---|
File Description | CSV file containing gene expression data for breast cancer samples |
File Design | CSV (Comma-separated values) |
File Url | https://www.example.com/files/breast_cancer_expression_data.csv |
File Assay | RNA Sequencing |
File Level | Level 3: Processed summary data, like gene expression counts or coverage statistics (e.g., CSV files) |
File Species | Human |
File Tumor Type | Breast Carcinoma |
File Tissue | Breast |
File View | List View |
FileView_id | FileView_789012 |
File Format | CSV |
File Alias | Breast_Cancer_Gene_Expression.csv |
Full Field Reference¶
Below is the full field reference table with attributes and their descriptions.
Attribute | Description | Required | Validation Rules | Examples |
---|---|---|---|---|
File Description | Description of the file. | False | str | "CSV file containing gene expression data for breast cancer" |
File Design | The overall design of the dataset or file. | False | str | "Gene expression values derived from RNA sequencing of tumor and normal tissue samples, processed using STAR aligner and featureCounts. Samples collected from 100 patients at diagnosis." |
File Url | The url of where the file is stored. | True | url | https://www.example.com/files/breast_cancer_expression_data.csv |
File Assay | The assay the file is representative of. | True | None | RNA Sequencing |
File Level | The processing level the file can be mapped to. | True | None | Level 3 |
File Species | The species the data was collected on. Multiple values permitted, comma separated. | True | list like | Human |
File Tumor Type | The tumor type(s), if applicable, of the data collected. Multiple values permitted, comma separated. | False | list like | Breast Carcinoma |
File Tissue | Tissue type(s) associated with the file. Multiple values permitted, comma separated. | False | list like | Breast |
File View | The denormalized manifest for file submission. | False | None | List View |
FileView_id | A unique primary key that enables record updates using schematic. | True | unique | "SynapseID_123456" |
File Longitudinal Group | A label that can be used to identify groups of files from the same longitudinal/time-resolved experiment | False | str | Patient Cohort A - Baseline |
File Longitudinal Event Type | The type of event associated with collection of the data contained in the file (e.g., time increment, treatment time elapsed) | False | str | Baseline |
File Longitudinal Sequence Identifier | The order in which this file was collected with respect to the longitudinal experiment (e.g., 1, 2, etc.). Integer. | False | int | 10-9876543210-12 |
File Longitudinal Time Elapsed Unit | The unit of time associated with Sequential and Total Time Elapsed attributes. | False | str | Seconds |
File Longitudinal Sequential Time Elapsed | The time elapsed between collecting the current and previous files in this longitudinal group. | False | num | "9900s" |
File Longitudinal Total Time Elapsed | The total time elapsed between the first and current files contained this longitudinal group. | False | num | "990000s" |
File Format | The format of the file described by this entry. | True | None | CSV |
File Alias | A string identifier associated with the file. Must be unique. Can be the repository accesssion number (e.g., Synapse ID, GEO identifier such as GSE12345). No Greek Letters or DOIs. | True | unique | "SynapseID_123456" |
File Anatomic Site | The anatomic site associated with the data contained in this file. | True | list like | Brain stem, Cervix uteri |