Skip to content

File Data Model

A File represents a single data unit within a dataset, such as a document, spreadsheet, or image. In research and data management, files are essential for storing and organizing raw or processed data. They provide the foundation for data analysis, sharing, and compliance with storage policies. Properly formatted file entries ensure data can be retrieved, validated, and referenced consistently across projects.

The attributes in this model describe key metadata about each file, including its format, assay type, species, and associated dataset. These details help maintain file traceability and usability within data repositories.

Why You Should Contribute File Entries

Contributing file entries ensures that critical data files are accessible, organized, and easily retrievable. By documenting your files with key metadata, you enhance their usability in research workflows and support data sharing and collaboration. Accurate and detailed file entries also help prevent data loss, streamline future analyses, and facilitate compliance with research data management policies.

Who Should Be Contributing File Entries?

  1. Researchers – Share and organize key datasets and processed results to improve reproducibility and collaboration across projects.
  2. Data Managers – Maintain an organized structure for large-scale data repositories by documenting file attributes like formats, assays, and species.
  3. Project Leads – Ensure data generated by your research projects is categorized and described, making it easier for teams to access and reuse.
  4. Bioinformaticians and Data Analysts – Provide detailed metadata to streamline data integration, analysis pipelines, and compatibility with downstream tools.
  5. Collaborative Consortia Members – Contribute shared resources to foster data transparency, enabling broader collaboration and multi-institution research efforts.

Download Template

Use the file entry template to streamline your data entry process. The template contains pre-defined required fields.

Example Data Entry

The table below includes sample values to demonstrate proper attribute usage.

Example Data Entry (Biology-Focused)

Attribute Example Value
File Description CSV file containing gene expression data for breast cancer samples
File Design CSV (Comma-separated values)
File Url https://www.example.com/files/breast_cancer_expression_data.csv
File Assay RNA Sequencing
File Level Level 3: Processed summary data, like gene expression counts or coverage statistics (e.g., CSV files)
File Species Human
File Tumor Type Breast Carcinoma
File Tissue Breast
File View List View
FileView_id FileView_789012
File Format CSV
File Alias Breast_Cancer_Gene_Expression.csv

Full Field Reference

Below is the full field reference table with attributes and their descriptions.

Attribute Description Required Validation Rules Examples
File Description Description of the file. False str "CSV file containing gene expression data for breast cancer"
File Design The overall design of the dataset or file. False str "Gene expression values derived from RNA sequencing of tumor and normal tissue samples, processed using STAR aligner and featureCounts. Samples collected from 100 patients at diagnosis."
File Url The url of where the file is stored. True url https://www.example.com/files/breast_cancer_expression_data.csv
File Assay The assay the file is representative of. True None RNA Sequencing
File Level The processing level the file can be mapped to. True None Level 3
File Species The species the data was collected on. Multiple values permitted, comma separated. True list like Human
File Tumor Type The tumor type(s), if applicable, of the data collected. Multiple values permitted, comma separated. False list like Breast Carcinoma
File Tissue Tissue type(s) associated with the file. Multiple values permitted, comma separated. False list like Breast
File View The denormalized manifest for file submission. False None List View
FileView_id A unique primary key that enables record updates using schematic. True unique "SynapseID_123456"
File Longitudinal Group A label that can be used to identify groups of files from the same longitudinal/time-resolved experiment False str Patient Cohort A - Baseline
File Longitudinal Event Type The type of event associated with collection of the data contained in the file (e.g., time increment, treatment time elapsed) False str Baseline
File Longitudinal Sequence Identifier The order in which this file was collected with respect to the longitudinal experiment (e.g., 1, 2, etc.). Integer. False int 10-9876543210-12
File Longitudinal Time Elapsed Unit The unit of time associated with Sequential and Total Time Elapsed attributes. False str Seconds
File Longitudinal Sequential Time Elapsed The time elapsed between collecting the current and previous files in this longitudinal group. False num "9900s"
File Longitudinal Total Time Elapsed The total time elapsed between the first and current files contained this longitudinal group. False num "990000s"
File Format The format of the file described by this entry. True None CSV
File Alias A string identifier associated with the file. Must be unique. Can be the repository accesssion number (e.g., Synapse ID, GEO identifier such as GSE12345). No Greek Letters or DOIs. True unique "SynapseID_123456"
File Anatomic Site The anatomic site associated with the data contained in this file. True list like Brain stem, Cervix uteri