Parallel Versus Distributed Data Access for Gigapixel-Resolution Histology Images: Challenges and Opportunities

并行数据访问与分布式数据访问在千兆像素分辨率组织学图像处理中的应用:挑战与机遇

阅读:2

Abstract

Recent advances in digital pathology technology have led to significant improvements in terms of both the quality and resolution of the resulting images, which now often exceed several gigabytes each. Today, several leading institutions across the country utilize whole-slide imaging (WSI) as part of their routine workflow. WSIs have utility in a wide range of diagnostic and investigative pathology applications. The fact that these images are both large in size (about 30 GB when uncompressed) and are generated in nonstandard proprietary formats has limited wider adoption of these technologies and makes the task of accessing, processing, and analyzing them in high-throughput fashion extremely challenging. The common approach for such data analytic applications is to preprocess the large whole-slide images into smaller size files and store them in a generic format. However, this approach limits the advantages that might be realized if different scalability levels and data unit sizes could be dynamically changed based on the specifications of the task at hand and the architectural limits of the infrastructure (e.g., node memory size). Such strategies also introduce extra processing time to the workflow. To address these challenges, we present, in this paper, novel scalable access methods for parallel file systems and distributed file/object storage systems. Experimental results gathered during the course of our studies show that these methods provide opportunities not realizable using traditional approaches. We demonstrate tangible, scalability, and high-throughput advantages using a Lustre parallel file system and AWS S3 distributed storage system.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。