How to Download Files
Show files
Begin by inspecting the contents of the data package to identify its files.
Example data package file
The examples here use the file JAXDP00006X.zip from the page on How to Download a Data Package.
jbq read package JAXDP00006X --show file
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ File Name ┃ URL ┃ Investigation ┃ Study ┃ Assay ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 202111_CUBE_Islet_Discovery_Proteomics_Data.xlsx │ https://thejacksonlaboratory.ent.box.… │ JAXIN000001 │ JAXST00001R │ JAXAS00002H │
│ 2021_CUBE_Adipose_C18negative_All-Data.xlsx │ https://thejacksonlaboratory.ent.box.… │ JAXIN000001 │ JAXST00001R │ JAXAS00002O │
│ 2021_CUBE_Adipose_C18negative_metadata.csv │ https://thejacksonlaboratory.ent.box.… │ JAXIN000001 │ JAXST00001R │ JAXAS00002O │
│ 2021_CUBE_Adipose_C18positive_All-Data.xlsx │ https://thejacksonlaboratory.ent.box.… │ JAXIN000001 │ JAXST00001R │ JAXAS00002P │
│ 2021_CUBE_Adipose_C18positive_metadata.csv │ https://thejacksonlaboratory.ent.box.… │ JAXIN000001 │ JAXST00001R │ JAXAS00002P │
......
Download files
Once you have identified the file names within the data package, use the "--get-files" option followed by the file names to initiate the download:
jbq read package JAXDP00006X --get-files adipose_metadata.csv,liver_metadata.csv
File names are case-insensitive, so you can use either upper or lower case.
output
Retrieve file information for the package "JAXDP00006X"
Files to be downloaded
├── liver_metadata.csv:
└── adipose_metadata.csv:
1 saved: C:\Users\liangh\Desktop\BioConnect_Data\JAXDP00006X\liver_metadata.csv
2 saved: C:\Users\liangh\Desktop\BioConnect_Data\JAXDP00006X\adipose_metadata.csv
Download all the files in a package
jbq read package JAXDP00006X --get-files all
Filter Files
Wildcard is accepted for match, and could be at the beginning, end, or both
-
beginning: download all csv files
jbq read package JAXDP00006X --get-files *.csv
-
end: download all files beginning with word "liver"
jbq read package JAXDP00006X --get-files liver*
-
both: download all files contain the word "liver"
jbq read package JAXDP00006X --get-files *liver*
Slurm Download on sumner2
A Slurm array can significantly accelerate the download of multiple large files in parallel. By allocating a number of Slurm tasks equivalent to the number of files to be downloaded, each Slurm task can be assigned to download one file efficiently.
Example Slurm file: jbq_slurm.sh
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --array=1-10 # for 10 files # index base is 1
#SBATCH --output=job.%J.out
#SBATCH --error=job.%J.err
#SBATCH --job-name="jbq"
cd /flashscratch/liangh/BioConnect_Data && \
jbq read package JAXDP00006X --get-files *liver* --task-id=$SLURM_ARRAY_TASK_ID
on sumner2
module load singularity
sbatch jbq_slurm.sh
Sample output
# working directory
[liangh@sumner017 BioConnect_Data]$ pwd
/flashscratch/liangh/BioConnect_Data
# files
[liangh@sumner017 BioConnect_Data]$ ll
total 28600
drwxr-xr-x 3 liangh jaxuser 711 May 22 13:19 JAXDP00006X
-rw-r--r-- 1 liangh jaxuser 39299 May 16 11:59 JAXDP00006X.zip
-rwxr-xr-x 1 liangh jaxuser 25601888 May 22 13:07 jbq
-rw-r--r-- 1 liangh jaxuser 267 May 22 13:09 jbq_slurm.sh
# submit job
[liangh@sumner017 BioConnect_Data]$ sbatch jbq_slurm.sh
Submitted batch job 892831
# check the status of the jobs
[liangh@sumner017 BioConnect_Data]$ squeue -j 892831
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
892831_19 compute jbq liangh CG 0:06 1 sumner020
892831_3 compute jbq liangh CG 0:06 1 sumner072
892831_18 compute jbq liangh R 0:06 1 sumner017
892831_15 compute jbq liangh R 0:06 1 sumner106
892831_13 compute jbq liangh R 0:06 1 sumner101
892831_12 compute jbq liangh R 0:06 1 sumner100
892831_10 compute jbq liangh R 0:06 1 sumner095
892831_9 compute jbq liangh R 0:06 1 sumner088
892831_8 compute jbq liangh R 0:06 1 sumner087
892831_7 compute jbq liangh R 0:06 1 sumner085
892831_6 compute jbq liangh R 0:06 1 sumner078
892831_5 compute jbq liangh R 0:06 1 sumner077
892831_1 compute jbq liangh R 0:06 1 sumner058
[liangh@sumner017 BioConnect_Data]$ squeue -j 892831
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
# downloaded files
[liangh@sumner017 BioConnect_Data]$ ll JAXDP00006X
total 144296
-rw-r--r-- 1 liangh jaxuser 21720769 May 22 13:19 2021_CUBE_Liver_C18negative_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser 4012 May 22 13:19 2021_CUBE_Liver_C18negative_metadata.csv
-rw-r--r-- 1 liangh jaxuser 30354790 May 22 13:19 2021_CUBE_Liver_C18positive_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser 4012 May 22 13:19 2021_CUBE_Liver_C18positive_metadata.csv
-rw-r--r-- 1 liangh jaxuser 61694943 May 22 13:19 2021_CUBE_Liver_Discovery_Proteomics_Data.xlsx
-rw-r--r-- 1 liangh jaxuser 5778668 May 22 13:19 2021_CUBE_Liver_HILICnegative_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser 4012 May 22 13:19 2021_CUBE_Liver_HILICnegative_metadata.csv
-rw-r--r-- 1 liangh jaxuser 10414219 May 22 13:19 2021_CUBE_Liver_HILICpositive_All-Data.xlsx
-rw-r--r-- 1 liangh jaxuser 4012 May 22 13:18 2021_CUBE_Liver_HILICpositive_metadata.csv
drwxr-xr-x 2 liangh jaxuser 138 May 22 13:15 file_types
-rw-r--r-- 1 liangh jaxuser 3953 May 22 13:19 liver_metadata.csv
-rw-r--r-- 1 liangh jaxuser 664 May 22 13:15 README.txt
-rw-r--r-- 1 liangh jaxuser 2159080 May 22 13:15 ro-crate-metadata.json
-rw-r--r-- 1 liangh jaxuser 34132 May 22 13:15 ro-crate-preview.html