National Natural Science Foundation of ChinaNational Natural Science Foundation of China (NSFC) [61701177]; Hunan Provincial Natural Science Foundation of ChinaNatural Science Foundation of Hunan Province [2018JJ3225]; Scientific Research Project of Hunan Province Education Office [17A096]
High-accuracy splice site recognition based on machine learning is the key to eukaryotic genome annotation. In this paper, we used chi-square test to determine the window size of sequences, and constructed a chi-square statistical difference table to extract the positional features, and combined with the frequencies of dinucleotides to characterize sequences. For the problem that the positive and negative samples of splice sites are extremely imbalanced, 10 SVM classifiers based on the equal proportion of positive and negative samples were buil...