Basis Technology R&D team pioneers new technique and open sources WikiSem500, a dataset for multilingual word embedding evaluation The most time consuming and expensive aspect of machine learning research is data preparation—aggregation and cleaning—and every data scientist has been frustrated by it. However the importance of good, testing data makes it hard to cut corners. […]