Abstract
BACKGROUND: Many long non-coding RNAs(lncRNAs) have been found to be a good marker for several tumors. Using lncRNA-mining approach, we aimed to identify lncRNA expression signature that can predict breast cancer patient survival. METHODS: We performed LncRNA expression profiling in 887 breast cancer patients from Gene Expression Omnibus (GEO) datasets. The association between lncRNA signature and clinical survival was analyzed using the training set(n = 327, from GSE 20685). The validation for the association was performed in another three independent testing sets(252 from GSE21653, 204 from GSE12276, and 104 from GSE42568). RESULTS: A set of four lncRNA genes (U79277, AK024118, BC040204, AK000974) have been identified by the random survival forest algorithm. Using a risk score based on the expression signature of these lncRNAs, we separated the patients into low-risk and high-risk groups with significantly different survival times in the training set. This signature was validated in the other three cohorts. Further study revealed that the four-lncRNA expression signature was independent of age and subtype. Gene Set Enrichment Analysis (GSEA) suggested that gene sets were involved in several cancer metastasis related pathways. CONCLUSIONS: These findings indicate that lncRNAs may be implicated in breast cancer pathogenesis. The four-lncRNA signature may have clinical implications in the selection of high-risk patients for adjuvant therapy.