ERStruct: a fast Python package for inferring the number of top principal components from whole genome sequencing data - BMC Bioinformatics

  • 📰 BioMedCentral
  • ⏱ Reading Time:
  • 45 sec. here
  • 2 min. at publisher
  • 📊 Quality Score:
  • News: 21%
  • Publisher: 71%

Technology Technology Headlines News

Technology Technology Latest News,Technology Technology Headlines

An article published in BMCBioInformatics presents ERStruct: an efficient and user-friendly tool for estimating the number of top informative principal components that capture population structure from whole genome sequencing data.

. The GPU-based Python implementation runs much faster than the CPU-based Python and MATLAB implementations. In terms of the maximum memory usage, the GPU-based Python implementation used only 0.27 of the CPU-based Python implementation and 0.31 of the MATLAB implementation due to the data splitting procedure.

To evaluate the accuracy of our ERStruct algorithm implemented in both the Python package and MATLAB toolbox, we conducted 30 identical experiments on CPU and GPU, respectively. We used the 1000 Genomes data set with MAF greater than 0.001, and set the number of replications to. These results indicate that the ERStruct algorithm implemented in both versions are identical and produce the same output.

Table 1 Running time and maximum memory usage comparisons of the ERStruct algorithm, using MATLAB, Python CPU, and Python GPU implementations on the 1000 Genomes Project data with different MAF filtering thresholdsIn this paper, we developed a Python package that employs the ERStruct algorithm to determine the optimal number of top informative PCs in WGS data.

]. Our results demonstrate a significant improvement in computation speed with the ERStruct Python package.

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 22. in TECHNOLOGY

Technology Technology Latest News, Technology Technology Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data - BMC BioinformaticsBackground Structural variations (SVs) refer to variations in an organism’s chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. Result We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2–8% compared with the second-best method when the read depth is greater than 5×. More importantly, SVcnn has better performance for detecting multi-allelic SVs. Conclusions SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn .
Source: BioMedCentral - 🏆 22. / 71 Read more »