A number of 3rd party clients are available for transfer of data into and from Object Storage, what makes s5cmd useful is its and clear syntax and speed, it can be run in a parallel mode to provide better throughput. This article will provide the basics for getting started with the s5cmd tool.
Pre-requisites
- s5cmd available from the Github project - https://github.com/peak/s5cmd
- the aws cli must already be installed & configured. It is available from Amazon at https://aws.amazon.com/cli/
- a Zadara Object Instance
- a user account in the Object Instance, with relevant privileges to the storage (the users S3 Access key and secret will be used in the AWS profile to authorise access to the instance)
Overview
The s5cmd like the similar s3cmd requires that you have configured an aws named profile or profiles ( if using multiple accounts or object stores), this article does not cover profile configuration.
You must have an accessible Zadara Object instance and the URL details for the instance.
A suitable user account with any container/bucket name and privileges to create/remove folder and data objects is required.
Usage
The s5cmd tool itself has it's own help option to get you started.
$ s5cmd help
Example commands
list bucket & sub-folder content;
s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
ls s3://mybucket/folder01/s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com
mb s3://mybuckets5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
cp myfile.txt s3://mybucket/folder01/s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
cp "*.txt" s3://mybucket/folder02/Note: When using wildcards you may find it useful to enclose within double quotes "" as shown above, or single ' ' quotes .
We can also automate a number of tasks by placing multiple commands in a file;
s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
run backup_files.cmd
Where the file "backup_files.cmd" contains;
mb s3://mybackup
cp *.bak s3://mybackup/Concurrency / Parallel jobs by setting the numworkers count, the default is 256 - which depending on the size and workload of your Object instance may need tuning;
s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
--numworkers 15 \
cp '*2023.rpt' s3://mybucket/reports-2023/
Counter intuitively when working with files over 1GB or generally larger files, it helps to set the 'numworkers' lower eg. 15, 20 or 30.
Using with a Proxy
If using s5cmd with a Proxy set HTTP_PROXY, HTTPS_PROXY environment variable disable SSL verification;
HTTP_PROXY="http://localhost:8080" \
HTTPS_PROXY="https://localhost:8080" \
s5cmd --no-verify-ssl --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
ls s3://mybucket/reports-2024/
File sizes Larger than 1GB
Mentioned earlier the default number of worker threads is 256, which is good when working with many small files. However, to get the best transfer rate with large files of 1GB or more you will want to reduce the number of client side threads by setting the --numworkers <value> to a more reasonable setting , eg. 15 or 20. Some tuning may be required. ( see above for an example )
Checking Folder Statistics
Use for checking size and number of objects either at a bucket or folder level, wildcards and patterns can be used.
$ s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
du --humanize 's3://mybucket/2023/*'
121.8M bytes in 17 objects: s3://mybucket/2023/*
========================================================
$ s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
du -H "s3://mybucket05/*"
5.2G bytes in 417 objects: s3://mybucket05/*
========================================================
$ s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
du -H "s3://mybucket06/test6/*"
1.7G bytes in 130 objects: s3://mybucket06/test6/*
========================================================
$ s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
du -H "s3://mybucket12/test6/*"
51.0M bytes in 7 objects: s3://mybucket12/test6/*
========================================================
$ s5cmd --profile myprofile \
--endpoint-url https://vsa-00000123-public-my-cloud-01.zadarazios.com \
du -H "s3://mybucket03/reports80/file_10*"
100.0M bytes in 10 objects: s3://mybucket03/reports80/file_10*Tracing, Logging Debugging with s5cmd
You can also gauge if you have the right workrate by adding a debug or tace to your command line and reviewing the output;
s5cmd --log debug --numworkers 128 --profile myprofile \
--endpoint-url=https://vsa-00000123-public-my-cloud-01.zadarazios.com:443" \
cp "*.fle" s3://mybucket/files1/
The debug output as below would indicate possible problems with the workers setting;
DEBUG retryable error: RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period.
status code: 400, request id: txb64bfc947a4b40ebbe374-0068ee4b58, host id: txb64bfc947a4b40ebbe374-0068ee4b58
DEBUG retryable error: RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period.
status code: 400, request id: tx6ab377b66d2848539ae4d-0068ee4b58, host id: tx6ab377b66d2848539ae4d-0068ee4b58In this example, reducing --numworkers to 80 eliminated the timeout, 400 responses completely!
The --log option can be any one of; error | debug | info | trace
_________________________________________________________
The above shows a brief overview of what this versatile command tool can offer more information is available online at https://github.com/peak/s5cmd
Alternatives to s5cmd are s3cmd, s4cmd and the original "aws cli" , each tool has it's own merits and differences making it more or less suitable for different situations.