Moving Data at Scale & Speed,
How To Do It Right
Zettar Inc. (Zettar) is a Silicon Valley, California, U.S. based software company building and delivering scalable, robust, and high-performance data mover software for distributed data-intensive engineering and science workloads such as for genomics, life sciences, Oil & Gas, AI, machine learning, transporting data for large scale IoT deployments, autonomous vehicle fleets, smart cities, EDA, Media & Entertainment Studio post-production, light sources (large lasers), accelerators, large telescopes.
The Zettar team has rich first hand experience in helping tier-1 customers in the biopharmaceutical, Oil & Gas, hyperscalers, Media & Entertainment Studios, and supercomputing centers in different countries. As a result, even as a software company, the Zettar engineering team has deep and comprehensive understanding the entire infrastructure: storage, computing, networking (including network security).
Zettar has been focusing on moving data at scale and speed since 2014. The company is supported by its revenue. A few engineering initiatives are funded by the U.S. National Science Foundation (NSF) and U.S. Department of Energy, Office of Science.
Since early 2015, Zettar has been engaged to support the the highly ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II), a premier U.S. DOE Exascale Computing Initiative (ECI) preparation project hosted at the SLAC National Accelerator Laboratory in Menlo Park, California. As a result, all engineering members have gained extensive experience applying the U.S. DOE ECI’s “co-design” principle – integrated consideration of storage, computing, networking, and concurrent software for optimal performance. Thus, working with Zettar will help your business to gain such valuable experience as well.
Foremost, we strongly recommend getting all infrastructural stacks ready, storage, computing (servers, physical and/or virtual), networking (including network security, e.g. firewall). Moving data at scale and speed is neither a software alone nor network alone endeavor.
Please Contact Sales. Zettar’s Tech Sales team will be in touch and provide you the necessary info.
zx is designed for moving machine generated data at scale and speed. If your workflows involve mostly API-enabled, automated data movements, then zx is an excellent fit. Although it comes with an easy-to-use built-in WebUI, it is a data mover application designed foremost for highly automated data moving tasks.
Once you are here, please Contact Sales. Our Tech Sales and Solution Architect will be in touch and help you further in your planning and decision-making. Of course, you are invited to review the rest of this documentation. The information should be helpful as well.
Zettar has developed an easy-to-deploy, software-based, scale-out data mover solution that provides top performance, simplicity, manageability, scalability, and breakthrough economics. The software runs on standard server architectures and delivers 10x or more the performance than what legacy Managed File Transfer (MFT) architecture is capable of.
The Zettar zx has some very important core attributes. It is:
- software engine purposely designed for moving data at scale and speed – distributed, scalable, and running as a daemon (i.e. it is not a CLI utility)
- hardware agnostic solution
- solution that supports both POSIX-compliant file storage and AWS S3 (including true compatibles) object storage
A key software design principle is that hardware technology changes over time; a software data mover solution should accommodate such changes, which means that it must be able to run on any standard hardware platform. zx was designed to run on any standard Intel x86_64/AMD64-based server hardware. It works with conventional HDDs or the latest SSDs. zx also runs natively in a public compute cloud or works directly with a public storage cloud. zx eliminates the cost overhead of expensive specialized hardware and allows you to benefit from advances in technology without suffering the pain of forklift upgrades to next-generation architecture.
zx is ideal for transporting massive amounts of data for the following:
- Life Science research data replication: Next Generation Sequencing (NGS), bio-imaging, cancer research, molecular diagnosis, structural biology, and bio-informatics
- Oil & Gas exploration data transportation, among facilities or between on-prem and cloud
- Large scale NAS tech refresh; large production file system data replication and migration; in-cloud data migration; file to object storage migration
- HPC: light source and nuclear accelerator detector data off-site processing, camera data of large telescopes off-site for processing and storage; climate change simulation, computational physics, earthquake studies, space research, simulation, intelligence
- In-vehicle data collection for fleets of autonomous vehicles. Transporting collected data to data center and/or cloud
- AI, Machine Learning, and work flows involving GPU acceleration, both on prem and in the cloud
- Media an Entertainment: pre- and post-production workflow acceleration, content delivery, digital asset replication and disaster recovery
zx is an easy-to-deploy data mover solution that is configurable to fit your environment, giving you complete deployment flexibility.
- Hyperconverged deployments leverage your existing compute infrastructure while eliminating your data transfer setup footprint and reducing power and cooling costs.
- Pooled storage deployments are ideal when you want to use separate storage and compute infrastructure for application isolation, performance, or scalability.
- Public cloud deployments allow you to realize the promise of truly elastic computing by running zx on public cloud server instances.
The software runs well in physical servers, virtual machines, and containers. Which one to choose depends on the use case.
- zx provides flexibility, ease of deployment, and resiliency, whether on-premises, in a hybrid configuration, or entirely in the cloud for on-demand scalability.
- zx is a single, integrated data mover solution that provides the freedom to choose the environment best suited for your application based on performance, scale, and economics.
- It targets Red Hat Enterprise Linux 7.x and 8.x or a free rebuild like CentOS.
As a data movement software engine for moving data at scale and speed, Zettar zx has three key strengths:
- Simplicity – from the simple installation, configuration, and operation, to its truly integrated functionalities. As an “engine”, Zettar Engineering has strive to make it as simple as possible to use.
Scalability – among all data mover software applications, it’s one of the only few (all have been created with U.S. DOE Office of Science’s support) that is truly scale-out capable, not with some “cluster workload managers”. This fact again contributes to Zettar zx’s simplicity.
Efficiency – we have not run into any other data mover, free and commercial, that exhibits the same level of efficiency as Zettar zx. These are not empty words. We have enough deeds to prove (About Zettar, Publications).
On a per instance basis:
- A FTP server application typically runs on a single computer. Even being confined to run in this manner (i.e. no scale-out), zx is usually 10x or faster. rsync, scp, sftp, rclone, and robocopy are all end-user oriented CLI tools. Even they are used by experienced users, the same range of speed up has been observed in real world usages. Once the scale-out capability of zx is leveraged, such CLI tools will be left far behind. Even if a threaded and cluster capable FTP application is used (e.g. GridFTP), zx still holds both efficiency and performance advantages by a wide margin (> 50%). FYI, at the world’s highest level data mover competition, Supercomputing Asia 2019 Data Mover Challenge, Zettar beat out the Globus author team of GridFTP (slide17).
- Zettar zx supports both file and AWS S3 (including true compatibles) object storage. None of those CLI tools do so either at all or fully.
- Most importantly, once zx is setup, it offers operational simplicity and manageability these CLI tools can not match. For example, zx provides both built-in Web UI and API (Python SDK provided in source) which enables simple yet powerful automation. Furthermore, it offers advanced workflow management features such as check-point, restart, sophisticated bandwidth throttling, multi-level parallelism that works with storage, computing, and network resources. A critical benefit is that it’s proven network latency insensitive. None of those popular CLI tools can offer all such benefits.
s3cmd is a CLI end-user oriented tool running on a single computer. Even being confined to run in a single computer, zx is usually anywhere from 10x or more faster per instance. Once the scale-out capability of zx is leveraged, s3cmd will be far behind. The benefits over other CLI tools are applicable here as well. The ease-of-use aspects mentioned previously also apply in this case.
Such applications as a rule are not scale-out capable. Some may claim that they have patented data transfer protocols, but their results do not jibe with facts. Also, even being confined to run in a single computer, zx is usually anywhere from 10x or faster per instance. Once the scale-out capability of zx is leveraged, such tools cannot even match. Most importantly, once zx is setup, it is much easier to use. See these capability videos above.
Almost all well established data movers were introduced around 2000 – that’s 20 years ago. They did their jobs. But the world’s exponential data growth started in 2016, per Intel DCG EVP Navin Shenoy. Zettar zx is designed from the ground up to address this challenge. The problems when these well-known data movers came into being and the problems we address demand very different approaches and architectures. You are invited to review this 1min 48 sec video, Learn the water transport analogy, up on this page.
Zettar started off in 2015 by supporting a premier U.S. DOE Exascale Computing project, Linac Coherent Light Source II, which has highly ambitious data transfer requirements. The Zettar Product Brief up on this page has some details. Striving to meet such demanding requirements ever since, by 2018, zx has been able to attain excellent outcomes in the following production trials (not demos!) and international competition:
- In September 2018, using a modest test bed, under a hard 80Gbps bandwidth cap, with full encryption and checksumming, zx transferred 1PB in 29 hours over a 5000-mile loop provided by the U.S. DOE Energy Sciences Network (ESnet). 94% average bandwidth utilization was achieved.
- In March 2019, the 2-person Zettar team competed in and became the overall winner of the grueling 2-months long SCA19 DMC, this is the Olympic competition for data mover software. The six other participants are elite national teams (slide 17). Over two successive SCA DMCs (2019 and 2020), Zettar is still the only Overall Winner with its record unbroken.
- In October – early November, 2019, working with the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM)– University of Warsaw (Poland), A*STAR Computational Resource Centre (A*CRC, Singapore), the joint-effort succeeded in achieving a historical 1st data transfer production trial over the then brand new Collaboration Asia Europe-1 (CAE-1) 100Gbps network across a vast distance of 12,375 miles.
Please note that the above are all published by government agencies and supercomputing centers.
Yes, zx has the following capabilities integrated in (but may need different licenses to enable) for highly simplified deployment and efficient operation:
- Bulk transfer/stream data over digital connections of any speed and across any distance, for both files and AWS S3-style objects
- Replicate data file sets incrementally, even those with hundreds of millions of small files
- Accelerate the data loading/unloading for data shipping devices (AWS Snowball alike)
It is the most modern scale-out capable data mover software funded in 2019 by the U.S. DOE Office of Science.
zx employs unconditional end-to-end checksum.
zx provides linear scalability – the throughput increases linearly as the number of zx instances, assuming the storage throughput available to each instance is the same, and the available network bandwidth can accommodate the collective throughput, sufficient computing power is available to each zx instance, and firewalls are not hindering the desired data rates.
zx supports both POSIX and AWS S3 (including true compatibles).
This is extremely easy and only takes a few minutes. Please see the following:
- zx is a data mover. It runs on a compute node (aka data transfer node, or DTN).
- The compute node should be configured as a NFS client.
- Note that you can import multiple NFS shares and zx can work with all of them concurrently if they are mounted under the same root. This is true both for reading and writing.
zx can use the available bandwidth anywhere from Mbps to hundreds of Gbps. Typically, with modern hardware, assuming sufficient storage throughput available, a decent CPU model, fully populated CPU memory channels, enough available network bandwidth, and normal average file/object sizes (>= 4MB), a single data transfer node running zx can comfortably push/pull 70+Gbps or beyond. At the SCA19 DMC, even with the sub-optimal setup and a 2TB dataset consisting mix-sized files, Zettar was still able to reach this level. See the official announcement. So, using two nodes at each end not only have the potential to go beyond 100Gbps but also provide high-availability (HA) at the same time.
zx is insensitive to network latency. This has been proven
- In numerous production trials over a 5000-mile 100Gbps “loop” provided by the U.S. DOE Energy Science Network, for example, please see this September 2018 production trial.
- At a highest level international data mover competition: SCA19 DMC
- The historical 1st Poland-Singapore data transfer production trial over the CAE-1 100Gbps network across 12,375 miles.
Zettar Engineering has also done many transfers at much lower bandwidth internationally, e.g. 80Mbps from Europe to N. America.
zx compares the differences between two file data sets and transfers only the difference (i.e. delta). Thus, other than just bulk transferring/streaming data, zx can be used for efficient incremental replication.
zx can be used for large scale replication tasks, such as tech refresh of many NASes, large parallel file system storage pools.
zx offers standard based TLS encryption with various ciphers available to fit your requirements.
zx uses a key-based mechanism through TLS encrypted channel to achieve secure authentication.
Zettar software package (a single RPM) includes everything you need for high-performance, scale-out data transport. There are no additional license fees for standard enterprise file-based features.
Zettar supports annual subscription, perpetual license with white-glove support, and a flexible consumption-based model. The 2nd model may need you to purchase an annual Software Maintenance and Support Agreement, which is mandatory for the 1st year. The software is licensed on a per-node basis. Given its proven efficiency, even a large site usually needs just a few licenses.
As long as there is an active Software Maintenance and Support Agreement in place, Zettar license holders are entitled to all software fixes and product enhancements as part of the base product (zx-File) license. Please note that from time to time, Zettar may introduce new products (e.g. zx-Object, zx-Single-Site-Mode, zx-Append-Streaming) that are integrated into zx and are available for purchases. Such new products must be activated via additional licenses and would not be included in the base product license entitlement.
Zettar understands that your usage patterns and needs may change from time to time. So you can let your license expire if it’s of the annual subscription type.
zx’s built-in WebUI About page, License tab will show you the respective license expiry date of what you have licensed. Also, if your license is of the annual subscription type, zx will stop working post the expiry dates. With perpetual licenses zx always works. But we encourage you to keep the Software Maintenance and Support Agreement renewed timely to keep your base software always current and benefit from our professional support.
You can reinstate either the software license or the Software Maintenance and Support Agreement or both at any time after expiration by simply contact us via the online form or email. Written requests are necessary for both sides’ records.