Skip to content
  • Earle F. Philhower, III's avatar
    Support multihosted/threaded/ranged S3 operations · efeae974
    Earle F. Philhower, III authored
    Major rewrite to turn plugin into a multithreaded, high performance
    consumer of S3 services and features from Amazon and third-party
    object storage.
    
    Enable multithreaded, multipart/multirange operation on S3 objects:
      Enables support of parallel upload and download of portions of large
      objects, which can significantly improve performance on both Amazon S3
      and other object stores.
    
      The number of parallel threads used can be configured with the parameter
        S3_MPU_THREADS=##  (default 10 threads)
    
      The MB per part to use for each portion of the object is configured with
        S3_MPU_CHUNK=##  (default 64, units in MB)
    
      For storing objects, multipart PUT is used.  For retreiving objects from
      the archive, multi-ranged HTTP GETs are performed.
    
    Add S3_DEFAULT_HOSTNAME multiple host endpoint support:
      Add in multiple S3 host support.  Use a comma between host:ports to
      specify multiple hosts/ports to connect to in sequence (and in parallel
      when doing multipart uploads).  These hosts will be iterated over when
      connecting to the S3 service.
        S3_DEFAULT_HOSTNAME=ip1:port1,ip2:port2,ip3:port3;S3_AUTH_FILE=...
    
      Errors are logged to include the S3 host being used at the time of failed
      operations to help track down connecticity or other issues.
    
    Add S3_SERVER_ENCRYPT=[0|1] to enable at-rest S3 encryption:
      This sets the flag to request that S3 store the object data in encrypted
      form on-disk.  This is NOT the same thing as HTTPS, and only affects the
      data at-rest.  IF you need encrypted communications to S3 as well as
      encryption of the object data once there, please be sure to specify
        ...;S3_PROTO=HTTPS;S3_SERVER_ENCRYPT=1;...
      on the resource definition line.
    
      Note also that this is not supported by some local S3 appliances or
      software.  When unsupported, the S3 server will return errors for all
      upload operations, logged in rodsLog.
    
    Add S3_ENABLE_MD5 option to enable upload checksums on all PUTs:
      MD5 checksum calculations on all S3 PUT commands, setting the
      Content-MD5 header.
    
      Note that this requires reading each file effectively twice:  The
      first time the file is read to calculate the initial MD5 (because
      we need the MD5 to send the headers, we can't do this on the fly).
      Then the second time to actually send the file over the network to S3.
    
      If there is a MD5 mismatch it will be logged as an error by S3 and
      in rodsLog, and the iRODS system will be informed of archive failure.
    
    Testing parameters to support out-of-CI testing:
      Add S3BUCKET and S3PARAMS environment overrides for resource test.
      By default use the hardcoded values in prior version for bucketname
      and the parameters (authentication, threads, proto, etc.).  If
      the environment variable S3BUCKET is set, overide the bucket we use,
      and if S3PARAMS are set override the entire resource configuration
      string.
      For example:
       S3BUCKET=user S3PARAMS='S3_DEFAULT_HOSTNAME=192.168.122.128:443;"\
       "S3_AUTH_FILE=/tmp/s3;S3_PROTO=HTTP;S3_RETRY_COUNT=15;S3_WAIT_TIME_SEC=1;"\
       "S3_MPU_THREADS=10;S3_MPU_CHUNK=16' "\
       python run_tests.py --run_specific_test test_irods_resource_plugin_s3
    
    Add ERROR_INJECT testmode:
      Adds a -DERROR_INJECT compile-time option which will cause the specified
      call to pread or pwrite in a callback to fail.  This allows testing the
      retry/recovery mechanisms.
    
    Removed dead/obsolete code, fix GCC -pedantic -Wall warnings:
      Clean up the code of most global variables, dead variables and functions,
      and GCC warnings in -pedantic mode.
    
    Rework of error/retry handling, obey requested number of retries for all ops:
      All operations should now retry their operations on transient S3 errors,
      with the specified delay time and maximum retries.
    efeae974