摘要:
The solitary bee Osmia excavata (Hymenoptera: Megachilidae) is a key pollinator managed on a large scale. It has been widely used for commercial pollination of fruit trees, vegetables, and other crops with high efficiency in increasing the crop seeding rate, yield, and seed quality in Northern hemisphere. Here, a high-quality chromosome-level genome of O. excavata was generated using PacBio sequencing along with Hi-C technology. The genome size was 207.02Mb, of which 90.25% of assembled sequences were anchored to 16 chromosomes with a contig N50 of 9,485kb. Approximately 186.83Mb, accounting for 27.93% of the genome, was identified as repeat sequences. The genome comprises 12,259 protein-coding genes, 96.24% of which were functionally annotated. Comparative genomics analysis suggested that the common ancestor of O. excavata and Osmia bicornis (Hymenoptera: Megachilidae) lived 8.54 million years ago. Furthermore, cytochrome P450 family might be involved in the responses of O. excavata to low-temperature stress. Taken together, the chromosome-level genome assembly of O. excavata provides in-depth knowledge and will be a helpful resource for the pollination biology research.
通讯机构:
[Yuan, ZM ; Zhang, HY ] H;Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China.;Hunan Agr Univ, Coll Plant Protect, Hunan Engn & Technol Res Ctr Agr Big Data Anal & D, Changsha 410128, Peoples R China.
关键词:
Plant;lncRNA-miRNA interaction;Graph neural network;Heterogeneous network;Counterfactual link
摘要:
Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.
作者:
Cai, X. H.;Chen, T.;Wang, R. Y.;Fan, Y. J.;Li, Y.;...
期刊:
Theoretical and Applied Climatology,2019年137(3-4):2139-2149 ISSN:0177-798X
通讯作者:
Zhou, W.;Zhou, Q. M.
作者机构:
[Wang, R. Y.; Fan, Y. J.; Hu, S. N.; Yuan, Z. M.; Li, Y.; Cai, X. H.; Zhou, W.] Hunan Agr Univ, Hunan Prov Engn & Technol Res Ctr Agr Big Data An, Changsha 410128, Hunan, Peoples R China.;[Wang, R. Y.; Fan, Y. J.; Hu, S. N.; Yuan, Z. M.; Li, Y.; Cai, X. H.; Zhou, W.] Hunan Agr Univ, Hunan Prov Key Lab Biol & Control Plant Dis & Ins, Changsha 410128, Hunan, Peoples R China.;[Wang, R. Y.; Fan, Y. J.; Hu, S. N.; Yuan, Z. M.; Li, Y.; Cai, X. H.; Zhou, W.] Hunan Agr Univ, Hunan Prov Engn & Technol Res Ctr Biopesticide &, Changsha 410128, Hunan, Peoples R China.;[Zhou, Q. M.; Chen, T.] Hunan Agr Univ, Coll Agr, Changsha 410128, Hunan, Peoples R China.;[Li, H. G.; Li, X. Y.] Hunan Tobacco Co, Chenzhou Co, Chenzhou 423000, Peoples R China.
通讯机构:
[Zhou, W.; Zhou, Q. M.] H;[Zhou, W.] T;Hunan Agr Univ, Hunan Prov Engn & Technol Res Ctr Agr Big Data An, Changsha 410128, Hunan, Peoples R China.;Hunan Agr Univ, Hunan Prov Key Lab Biol & Control Plant Dis & Ins, Changsha 410128, Hunan, Peoples R China.;Hunan Agr Univ, Hunan Prov Engn & Technol Res Ctr Biopesticide &, Changsha 410128, Hunan, Peoples R China.
摘要:
Tobacco wildfire disease is common globally, and climate change may increase the risk of outbreaks. Therefore, there is an urgent need to establish an effective climate model to forecast the occurrence of wildfire disease. To design such a model, we collected data for 40 wildfire disease indices via tobacco field surveys and data for 15 climate factors of Guiyang County in China from 2012 to 2016. First, we built multiple linear regression (MLR), stepwise linear regression (SLR) and support vector regression (SVR) models using three climate features (precipitation, mean daily temperature and sunshine duration), and we could not find an effective model. Second, we built three corresponding models using expanded 15 climate features and an in-house WDEM method (the worst descriptor elimination multi-roundly), and the independent test results showed that the best SVR model had not only a higher predictive accuracy (
$$ {Q}_{ext}^2 $$
= 0.94) but also a better stability. Finally, we further evaluated the biological significance of their retained climate features and the single-factor effects of the best model according to the interpretability analysis, and our results indicated that (1) the three climate factors (minimum value of wind velocity, daily range of temperature and daily pressure) strongly affected the occurrence of wildfire disease; (2) the ranges of relative humidity and sunshine hours were negatively correlated with the occurrence of wildfire disease, while daily mean vapour pressure was positively correlated with the occurrence of the disease. Our work enables a useful theoretical prediction for wildfire disease, especially in terms of climate-related predictions.
摘要:
Splice sites prediction has been a long-standing problem in bioinformatics. Although many computational approaches developed for splice site prediction have achieved satisfactory accuracy, further improvement in predictive accuracy is significant, for it is contributing to predict gene structure more accurately. Determining a proper window size before prediction is necessary. Overly long window size may introduce some irrelevant features, which would reduce predictive accuracy, while the use of short window size with maximum information may performs better in terms of predictive accuracy and time cost. Furthermore, the number of false splice sites following the GT–AG rule far exceeds that of true splice sites, accurate and rapid prediction of splice sites using imbalanced large samples has always been a challenge. Therefore, based on the short window size and imbalanced large samples, we developed a new computational method named chi-square decision table (χ2-DT) for donor splice site prediction. Using a short window size of 11 bp, χ2-DT extracts the improved positional features and compositional features based on chi-square test, then introduces features one by one based on information gain, and constructs a balanced decision table aimed at implementing imbalanced pattern classification. With a 2000:271,132 (true sites:false sites) training set, χ2-DT achieves the highest independent test accuracy (93.34%) when compared with three classifiers (random forest, artificial neural network, and relaxed variable kernel density estimator) and takes a short computation time (89 s). χ2-DT also exhibits good independent test accuracy (92.40%), when validated with BG-570 mutated sequences with frameshift errors (nucleotide insertions and deletions). Moreover, χ2-DT is compared with the long-window size-based methods and the short-window size-based methods, and is found to perform better than all of them in terms of predictive accuracy. Based on short window size and imbalanced large samples, the proposed method not only achieves higher predictive accuracy than some existing methods, but also has high computational speed and good robustness against nucleotide insertions and deletions. This article was reviewed by Ryan McGinty, Ph.D. and Dirk Walther.
通讯机构:
Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, College of Plant Protection, Hunan Agricultural University, Changsha, China
作者机构:
[Xing, Pengwei; Chen, Yuan; Yuan, Zheming] Hunan Agr Univ, Hunan Engn & Technol Res Ctr Agr Big Data Anal De, Changsha 410128, Hunan, Peoples R China.;[Xing, Pengwei; Chen, Yuan; Yuan, Zheming] Hunan Agr Univ, Hunan Prov Key Lab Biol & Control Plant Dis & Ins, Changsha 410128, Hunan, Peoples R China.;[Gao, Jun] Univ Arkansas Med Sci, Dept Biochem & Mol Biol, Little Rock, AR 72205 USA.;[Bai, Lianyang] Hunan Acad Agr Sci, Biotechnol Res Ctr, Changsha 410125, Hunan, Peoples R China.
通讯机构:
[Yuan, Zheming; Bai, Lianyang] H;Hunan Agr Univ, Hunan Engn & Technol Res Ctr Agr Big Data Anal De, Changsha 410128, Hunan, Peoples R China.;Hunan Agr Univ, Hunan Prov Key Lab Biol & Control Plant Dis & Ins, Changsha 410128, Hunan, Peoples R China.;Hunan Acad Agr Sci, Biotechnol Res Ctr, Changsha 410125, Hunan, Peoples R China.
摘要:
<jats:title>Abstract</jats:title><jats:p>Selecting informative genes, including individually discriminant genes and synergic genes, from expression data has been useful for medical diagnosis and prognosis. Detecting synergic genes is more difficult than selecting individually discriminant genes. Several efforts have recently been made to detect gene-gene synergies, such as dendrogram-based <jats:italic>I</jats:italic>(<jats:italic>X</jats:italic><jats:sub>1</jats:sub>; <jats:italic>X</jats:italic><jats:sub>2</jats:sub>; Y) (mutual information), doublets (gene pairs) and <jats:italic>MIC</jats:italic>(<jats:italic>X</jats:italic><jats:sub>1</jats:sub>; <jats:italic>X</jats:italic><jats:sub>2</jats:sub>; <jats:italic>Y</jats:italic>) based on the maximal information coefficient. It is unclear whether dendrogram-based <jats:italic>I</jats:italic>(<jats:italic>X</jats:italic><jats:sub>1</jats:sub>; <jats:italic>X</jats:italic><jats:sub>2</jats:sub>; <jats:italic>Y</jats:italic>) and <jats:italic>doublets</jats:italic> can capture synergies efficiently. Although MIC(<jats:italic>X</jats:italic><jats:sub>1</jats:sub>; <jats:italic>X</jats:italic><jats:sub>2</jats:sub>; <jats:italic>Y</jats:italic>) can capture a wide range of interaction, it has a high computational cost triggered by its 3-D search. In this paper, we developed a simple and fast approach based on <jats:italic>abs</jats:italic> conversion type (<jats:italic>i.e</jats:italic>. Z = |<jats:italic>X</jats:italic><jats:sub>1</jats:sub> − <jats:italic>X</jats:italic><jats:sub>2</jats:sub>|) and <jats:italic>t</jats:italic>-test, to detect interactions in simulation and real-world datasets. Our results showed that dendrogram-based <jats:italic>I</jats:italic>(<jats:italic>X</jats:italic><jats:sub>1</jats:sub>; <jats:italic>X</jats:italic><jats:sub>2</jats:sub>; <jats:italic>Y</jats:italic>) and <jats:italic>doublets</jats:italic> are helpless for discovering pair-wise gene interactions, our approach can discover typical pair-wise synergic genes efficiently. These synergic genes can reach comparable accuracy to the individually discriminant genes using the same number of genes. Classifier cannot learn well if synergic genes have not been converted properly. Combining individually discriminant and synergic genes can improve the prediction performance.</jats:p>
作者机构:
[Zhe-Ming Yuan] Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China;[Aiping Wu; Taijiao Jiang] Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005;Suzhou Institute of Systems Medicine, Suzhou, China;[Xinlei Zhang] Suzhou Geneworks Technology Company Limited, Suzhou, China;[Mingming Su] Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
通讯机构:
[Taijiao Jiang] C;[Zhe-Ming Yuan] H;Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005;Suzhou Institute of Systems Medicine, Suzhou, China<&wdkj&>Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, China
摘要:
High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn: 7001/).