In 2020, when the global pandemic struck the world, humanity had a groundbreaking learning experience that has forever altered almost every aspect of human interaction. But even before then, artificial intelligence (AI) and the branch of AI known as machine learning (ML) were already on the path to fundamentally reshape almost every industry.
The year 2021 is ready to observe important ML and AI trends that are likely to reshape our economic, social, and industrial norms. According to a Gartner survey done in 2019, around 37% of all companies reviewed were utilizing some type of ML in their business. It is anticipated that around 80% of modern advances will be founded on AI and ML by 2022.
With the surge of interest in AI technology, various new trends are ascending in the field. In this article, we will focus on five AI trends thought to be commonly used and studied in 2021 that will have a powerful impact on the language industry and beyond: vokenization, data monetization, Deep Reinforcement Learning (DRL), Automated Narrative Text Generation, and Data Science as a Service (DSaaS).1. Vokenization
Language models have been trained only with text so far. To change that, researchers from the University of North Carolina, Chapel Hill have developed a new technique called “vokenization”, which gives language models like GPT-3 the ability to “see” by enabling them to process visual input. Research into how to combine language models and computer vision has been rapidly growing as AI models that can process both linguistic and visual input can lead to highly practical applications. For instance, if we were to build a robotic assistant, it would need both computer vision to navigate the physical environment and language processing abilities to communicate about it with humans.
The term “voken” was invented as part of the research. It is derived from the word “token”, which denotes a sequence of characters commonly used in language model training that roughly corresponds to a lexical unit such as a single word. And the UNC researchers call the image linked with each token in their visual-language model a “voken”. “Vokenizer” is then what they call the algorithm that matches vokens to tokens, and vokenization is the word that describes the whole process.
The researchers created visual and word embeddings with MS COCO to train their vokenizer algorithm. After the training, the vokenizer managed to match vokens to the tokens in English Wikipedia. The algorithm only found vokens for about 40% of the tokens. However, that still amounts to 40% of a dataset that includes approximately 3 billion words.
With this new dataset, BERT, which is the open-source, transformer-based language model developed by Google, was retrained. The researchers then tested the retrained BERT on six different language comprehension tests. The retrained version of BERT ended up performing better in all six languages.
Vokenization is seen as one of the most recent breakthroughs in AI as it manages to connect text with other modalities. It’s now conceptually possible to build a robot that can not only talk but also hear and see.
2. Data Monetization
With the realization that almost anything can be datafied, businesses now question how to generate value out of the high volumes of data available to or owned by them.
The value of data can grow in parallel with the ways it is utilized and the insights are drawn from it. Data monetization refers to the process of generating economic value out of data. At the core of data monetization strategies that have become a growing area of interest lies treating data as an asset and maximizing its value.
Being a newly explored business stream, data monetization has become many CIOs’ primary challenge. Many companies lack a business-centric value engineering methodology to identify and prioritize where and how AI/ML can derive new sources of customer, product, and operational value. With increasing successful data monetization examples, organizations begin to appreciate the economic value of data and analytics.
One way to overcome this challenge is to be open to new opportunities. Marketplace models seem to be emerging as a way to provide data monetization platforms. TAUS Data Marketplace and SYSTRAN Marketplace can be named as such opportunities for language data monetization. The TAUS Data Marketplace offers the chance to any data owner to monetize their assets by making sure the seller’s data assets reach the potential buyer without any marketing effort on their part.
3. Deep Reinforcement Learning
Based on the recent developments in the area of Artiﬁcial Intelligence, a promising new technology for building intelligent agents has evolved according to the research done by Ruben Glatt, Felipe Leno da Silva, and Anna Helena Reali Costa at Escola Politecnica da Universidade de Sao Paulo. This new technology is Deep Reinforcement Learning (DRL) and it represents the combination of classic Reinforcement Learning (RL) which uses sequential trial and error to learn the best action to take in every situation and modern Deep Learning approaches which can evaluate complex inputs and select the best response. Currently, DRL is one of the top trending fields in Machine Learning because it can solve a wide range of complex decision-making tasks. Before the advances in DRL, solving real-world problems with humanlike intelligence had been deemed unattainable for a machine.
These new approaches still require long training times for difficult and high-dimensional task domains. Generalization of gathered knowledge and transferring it to another task has been the topic of classical RL, yet it continues to be an unresolved problem in the DRL field. However, it has recently been uncovered in research that for policies, end-to-end learning is possible with high-dimensional sensory inputs in challenging task domains such as arcade game playing.
Besides academic research, cases where DRL defeated humans at games, were widely covered by the media. For instance, AlphaGo defeated the best professional human player in the game called Go. AlphaStar beat professional players at the game called StarCraft II. OpenAI’s Dota-2-playing bot beat the world champions in an e-sports game.
When it comes to business, many DRL applications can be seen in a wide range of sectors from inventory management, supply chain management, warehouse operations management, and demand forecasting to financial portfolio management. In Natural Language Processing, DRL is commonly used for tasks including abstractive text summarization, chatbots, and so on. A considerable amount of research is also being done on applications of DRL in education, healthcare, smart cities domains.
4. Automated Text Generation (ATG)
Artificial Intelligence has been used to provide solutions to various tasks pertaining to humans in hopes to reduce cost, time and effort, among other goals, that have led to rediscovering the use of computers. One of the most intriguing questions that remains is “Can a computer be creative?”. Brian Daniel Herrera-González, Alexander Gelbukh and Hiram Calvo from the Center for Computing Research analyze this question in their relevant study. One of the tasks that AI was applied to is the automatic generation of stories. Although certain patterns have been defined, there is still no clear methodology in place.
So far, based on academic research, two approaches regarding the task of automatic text generation have been identified: the symbolic approach and the connectionist approach.
In the symbolic approach, one of the advantages is that it’s easy to delimit the path followed by the actions of the story whereas the main disadvantage is that it lacks creativity and creates stories that are repetitive with a reduced repertoire of events. In the connectionist approach, the problem of lack of novelty in stories is resolved, however, the coherence is lost in the story. In the field of automated text generation, solutions such as sequence-by-sequence models (seq2seq) with recurrent neural networks (RNN) and unsupervised learning by generative adversarial networks GAN or Bidirectional Associative Memories have been proposed and discussed.
When it comes to the practical use of ATG, it’d be fair to say that most contemporary internet users come across ATG output daily. Among the established applications for automated text generation are football reports, earthquake notifications, stock exchange news, weather, and traffic jam reports. As the field advances in its application, it gets harder to differentiate between automatically generated articles and those written by humans. According to tests with football match reports, readers rated the output of the ‘robot journalist’ to be more natural than those of a ‘real’ editor. One interesting example of ATG is the new Brothers Grimm fairy tale produced by AI where the algorithm was trained to mimic the literary style of the Brothers Grimm. Additionally, ATG is also widely used in e-commerce where large quantities of product descriptions are produced by ATG with greater ease.
5. Data Science as a Service (DSaaS)
Data Science as a Service (DSaaS) involves the delivery of data gathered through advanced analytics applications to corporate clients, run by data scientists at an outside company (service provider). The aim is to collect business intelligence and make informed decisions about new business efforts.
The way it works is: clients upload the data to a big data platform or cloud database. The data scientist and service provider team specifically incorporated with data engineers work on the uploaded data. They analyze the data and prepare a report on which company is likely to buy your products, your rival details, your net earnings, revenue, etc.
Data Science as a Service is mostly used by companies that are suffering due to the shortage of data scientists. Eventually, DSaaS helps in providing growth in the business.
5 minute read