Accelerate & automate secure data transfers at scale with AWS DataSync (STG204)

Here is a detailed summary of the video transcript in Markdown format:

Data Sync: Overcoming Challenges of Large-Scale Data Transfers

Overview of AWS Data Sync

  • Data Sync is a service designed to help customers move their data quickly, securely, and reliably.
  • It addresses common challenges with large-scale data transfers, such as security, data verification, error recovery, and performance.
  • Key use cases include data migrations, data replication/archiving, and supporting AI/ML workflows.

Deep Dive into Data Sync

  • Data Sync supports copying data from a variety of locations: on-premises, other clouds, and within AWS.
  • It preserves metadata and translates between different storage types (e.g., object store to file system).
  • Deployment involves a Data Sync agent that communicates with the service in AWS over public, FIPS, or VPC endpoints.
  • The network path involves three legs: agent to on-premises storage, agent to AWS service, and AWS service to target AWS storage.

Setting up and Using Data Sync

  1. Deploy Data Sync agents (on-premises or in EC2).
  2. Create "locations" to define storage connections.
  3. Create "tasks" to copy data from source to destination, with various options for data verification, scheduling, and reporting.

Scaling Data Sync

  • Customers with high network bandwidth can deploy multiple agents and partition the source data.
  • For datasets with many small files, using multiple agents per task can increase throughput.

Resilience's Use Case

  • Resilience is a biomanufacturing company using the "Foundry" model to provide rapid process design and manufacturing capabilities to clients.
  • They built a data management platform using Data Sync to:
    • Automatically ingest data from 300+ lab instruments across 11 sites.
    • Provide a centralized, secure, and versioned data storage in AWS.
    • Enable self-service data access and analysis for research teams.

New Data Sync Features

  • Detailed task reports for auditing and chain of custody.
  • Manifest-based transfers to optimize for unchanging datasets.
  • Enhanced Mode tasks to overcome scalability limits and improve performance.

Conclusion

  • Key takeaways: Data Sync can help with large-scale data migrations, replication, and AI/ML workflows.
  • Additional resources: AWS website, chalk talks, AWS storage solutions.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors on this website.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.

Talk to us