Calculating Checksums
Overview
It's considered good practice when transferring any files (including downloading any executable from the internet!) to compare checksums, to ensure that the upload has succeeded, or to ensure that the file you downloaded really matches what was on the server.
Malicious agents often tamper with commonly downloaded files, but any modification will change the checksum. Therefore, checksums (aka md5sums) are calculated by the file creator and published on the website.
For more information, and to learn how to generate a checksum, see any of the following:
https://www.tutorialspoint.com/unix_commands/md5sum.htm
https://www.quickprogrammingtips.com/python/how-to-calculate-md5-hash-of-a-file-in-python.html
Checksums and Transfer Family
When a file is uploaded to S3, including over SFTP as described above, AWS computes a checksum, which is stored as the S3 "ETAG" attribute.
Example:
> aws s3api head-object --bucket mybucket --key Boston/jca-test/foo.txt | cat
{
"AcceptRanges": "bytes",
"LastModified": "2021-09-22T18:44:18+00:00",
"ContentLength": 4,
"ETag": "\"d3b07384d113edec49eaa6238ad5ff00\"",
"VersionId": "52fyw9M_r1OA1N3wwSx6koi8SPegV0kO",
"ContentType": "text/plain",
"ServerSideEncryption": "AES256",
"Metadata": {
"user-agent": "AWSTransfer",
"user-agent-id": "jabraham@s-1caef8d95eaf414bb"
}
}
If you download this file and compute the checksum, it matches the above:
> md5 foo.txt
MD5 (foo.txt) = d3b07384d113edec49eaa6238ad5ff00
Verifying Checksums with Transfer Family
The moral of the story: don't compute checksums on the files you uploaded -- S3 does that for you.
In cases where you are generating and uploading large numbers of files, if you wish to verify that they were uploaded successfully, we have written a utility available on GitHub.
Please see the README.md for complete instructions.