How to Download Files

Show files

Begin by inspecting the contents of the data package to identify its files.

Example data package file

The examples here use the file JAXDP00006X.zip from the page on How to Download a Data Package.

jbq read package JAXDP00006X --show file

output

┌──────────────────────────────────────────────────┬────────┬──────────────────┬──────────────────┬───────────────┬─────────────┬─────────────┐
│ File Name                                        │ Id     │ URL              │ File Data        │ Investigation │ Study       │ Assay       │
├──────────────────────────────────────────────────┼────────┼──────────────────┼──────────────────┼───────────────┼─────────────┼─────────────┤
│ 202111_CUBE_Islet_Discovery_Proteomics_Data.xlsx │ 154221 │ 202111_CUBE_Isl… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002H │
│ 2021_CUBE_Adipose_C18negative_All-Data.xlsx      │ 154417 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002O │
│ 2021_CUBE_Adipose_C18negative_metadata.csv       │ 154414 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002O │
│ 2021_CUBE_Adipose_C18positive_All-Data.xlsx      │ 154416 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002P │
│ 2021_CUBE_Adipose_C18positive_metadata.csv       │ 154415 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002P │
│ 2021_CUBE_Adipose_Discovery_Proteomics_Data.xlsx │ 154467 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002K │
│ 2021_CUBE_Adipose_HILICnegative_All-Data.xlsx    │ 154419 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002L │
│ 2021_CUBE_Adipose_HILICnegative_metadata.csv     │ 154222 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002L │
│ 2021_CUBE_Adipose_HILICpositive_All-Data.xlsx    │ 154418 │ 2021_CUBE_Adipo… │ https://bioconn… │ JAXIN000001   │ JAXST00001R │ JAXAS00002N │
......

Download files

Once you have identified the file names within the data package, use the "--get-files" option followed by the file names to initiate the download:

jbq read package JAXDP00006X --get-files islet_metadata.csv,liver_metadata.csv

If there are multiple files, separate their names with commas. The files will be saved in the data package directory.

File names are case-insensitive, so you can use either upper or lower case.

output

Retrieve file information for the package "JAXDP00006X"

 2 Files, 8.5 KB to download
├── 1. islet_metadata.csv 4.7 KB
│   https://bioconnect-ui-sqa.azurewebsites.net/search/metadata-search-browse/detail-view/data/154223
│
└── 2. liver_metadata.csv 3.9 KB
    https://bioconnect-ui-sqa.azurewebsites.net/search/metadata-search-browse/detail-view/data/154411

The downloaded files are saved in the same folder as the data package name

Download all the files in a package

jbq read package JAXDP00006X --get-files all

Filter Files

Wildcard is accepted for match, and could be at the beginning, end, or both

beginning: download all csv files

jbq read package JAXDP00006X --get-files *.csv

end: download all files beginning with word "liver"
```
jbq read package JAXDP00006X --get-files liver*
```

both: download all files contain the word "liver"

jbq read package JAXDP00006X --get-files *liver*

Slurm Download on sumner2

A Slurm array can significantly accelerate the download of multiple large files in parallel. By allocating a number of Slurm tasks equivalent to the number of files to be downloaded, each Slurm task can be assigned to download one file efficiently.

Example Slurm file: jbq_slurm.sh

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --array=1-10 # for 10 files  # index base is 1
#SBATCH --output=job.%J.out
#SBATCH --error=job.%J.err
#SBATCH --job-name="jbq"

cd /flashscratch/liangh/BioConnect_Data && \
jbq read  package JAXDP00006X --get-files *liver* --task-id=$SLURM_ARRAY_TASK_ID

Just make sure the number of job is equal or more than number of files.

on sumner2

module load singularity
sbatch jbq_slurm.sh

Sample output

# working directory
[liangh@sumner017 BioConnect_Data]$ pwd
/flashscratch/liangh/BioConnect_Data

# files
[liangh@sumner017 BioConnect_Data]$ ll
total 28600
drwxr-xr-x 3 liangh jaxuser      711 May 22 13:19 JAXDP00006X
-rw-r--r-- 1 liangh jaxuser    39299 May 16 11:59 JAXDP00006X.zip
-rwxr-xr-x 1 liangh jaxuser 25601888 May 22 13:07 jbq
-rw-r--r-- 1 liangh jaxuser      267 May 22 13:09 jbq_slurm.sh

# submit job
[liangh@sumner017 BioConnect_Data]$ sbatch jbq_slurm.sh
Submitted batch job 892831

# check the status of the jobs
[liangh@sumner017 BioConnect_Data]$ squeue -j 892831
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         892831_19   compute      jbq   liangh CG       0:06      1 sumner020
          892831_3   compute      jbq   liangh CG       0:06      1 sumner072
         892831_18   compute      jbq   liangh  R       0:06      1 sumner017
         892831_15   compute      jbq   liangh  R       0:06      1 sumner106
         892831_13   compute      jbq   liangh  R       0:06      1 sumner101
         892831_12   compute      jbq   liangh  R       0:06      1 sumner100
         892831_10   compute      jbq   liangh  R       0:06      1 sumner095
          892831_9   compute      jbq   liangh  R       0:06      1 sumner088
          892831_8   compute      jbq   liangh  R       0:06      1 sumner087
          892831_7   compute      jbq   liangh  R       0:06      1 sumner085
          892831_6   compute      jbq   liangh  R       0:06      1 sumner078
          892831_5   compute      jbq   liangh  R       0:06      1 sumner077
          892831_1   compute      jbq   liangh  R       0:06      1 sumner058

[liangh@sumner017 BioConnect_Data]$ squeue -j 892831
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

# downloaded files
[liangh@sumner017 BioConnect_Data]$ ll JAXDP00006X
total 144296
-rw-r--r-- 1 liangh jaxuser 21720769 May 22 13:19 2021_CUBE_Liver_C18negative_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser     4012 May 22 13:19 2021_CUBE_Liver_C18negative_metadata.csv
-rw-r--r-- 1 liangh jaxuser 30354790 May 22 13:19 2021_CUBE_Liver_C18positive_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser     4012 May 22 13:19 2021_CUBE_Liver_C18positive_metadata.csv
-rw-r--r-- 1 liangh jaxuser 61694943 May 22 13:19 2021_CUBE_Liver_Discovery_Proteomics_Data.xlsx
-rw-r--r-- 1 liangh jaxuser  5778668 May 22 13:19 2021_CUBE_Liver_HILICnegative_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser     4012 May 22 13:19 2021_CUBE_Liver_HILICnegative_metadata.csv
-rw-r--r-- 1 liangh jaxuser 10414219 May 22 13:19 2021_CUBE_Liver_HILICpositive_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser     4012 May 22 13:18 2021_CUBE_Liver_HILICpositive_metadata.csv
drwxr-xr-x 2 liangh jaxuser      138 May 22 13:15 file_types
-rw-r--r-- 1 liangh jaxuser     3953 May 22 13:19 liver_metadata.csv
-rw-r--r-- 1 liangh jaxuser      664 May 22 13:15 README.txt
-rw-r--r-- 1 liangh jaxuser  2159080 May 22 13:15 ro-crate-metadata.json
-rw-r--r-- 1 liangh jaxuser    34132 May 22 13:15 ro-crate-preview.html