Stock performance prediction prototype shows 62% accuracy using NLP, Deep Learning

December 11, 2017

Accurately predicting stock performance involves acquiring highly coveted data, and a new prototype using Natural Language Processing (NLP) and Deep Learning is showing very promising results.

“For investment firms, predicting likely under-performers may be the most valuable prediction of all, allowing them to avoid losses on investments that will not fare well,” writes Patty Ryan, Principal Data and Applied Scientist at Microsoft.

Patty Ryan

By partnering with a financial services company “to develop a model to predict the future stock market performance of public companies in categories where they invest,” the team at Microsoft modeled its prototype on just one industry, the biotechnology industry, which had the most abundant within-industry sample.

The project goal was to discern whether they could outperform the chance accuracy of 33.33%, but the results went way beyond that 33.33%.

What they found was a 62% accuracy for predicting the under-performing company, almost double what the chance accuracy was.

But how did they do it? Therein lies the question, but the answer may be found among industry buzzwords and a whole lot of code.

If you are a developer or familiar with these tools and concepts, Patty Ryan did a remarkable job walking you through the entire process, step-by-meticulous-step on the Microsoft Developer blog.

However, I will attempt to summarize and layout how they did it here.

The stock perfomance prediction prototype technical aspects

Natural Language Processing (NLP), pre-processing, and Deep Learning were utilized in order “to prototype a predictive model to render consistent judgments on a company’s future prospects, based on the written textual sections of public earnings releases extracted from 10k releases and actual stock market performance.”

For input the team then proceeded to gather “a text corpus of two years of earnings release information for thousands of public companies worldwide.” They “extracted as source the sections 1, 1A, 7 and 7A from each company’s 10k — the business discussion, management overview, and disclosure of risks and market risks.”

Additionally, they “gathered the stock price of each of the companies on the day of the earnings release and the stock price four weeks later,” categorizing the public companies by industry category.

The tools used included Python with Azure Machine Learning Workbench, Jupyter Notebook, and NLP tools including the Gensim library.

Executive Producer Dave Mendlen (R) on the set of Decoded with Host John Shewchuk (L).

Machine Learning, according to Microsoft General Manager and Executive Producer of the Decoded Show, Dave Mendlen, is something that happens “on the server side” to “bring the best information forward.”

“Let’s take this technology and enable it to learn on its own,” says Mendlen on Machine Learning, adding, “and put that in the back-end for developers to take use of. If you are building an application, you can use that to do amazing things. They tend to be things that I’ll call back-end things or processing things that the user doesn’t necessarily see directly.”

Overcoming adversity

One of the difficulties that arose in the stock performance prediction prototype was that the “pre-trained word vectors” they used as a model had a limited vocabulary of some 400,000 words. Many industries have specific vocabulary that is not used outside their particular niche. However, the “GloVe pre-trained model of all of Wikipedia’s 2014 data” did prove useful in allowing the team to “vectorize” its document set and prepare it for deep learning toolkits.

After embedding all the documents and data, they were then “able to take advantage of a convolutional neural network (CNN) model to learn the classifications.”

More number crunching, embedding, and model training pursued, and in the end, the “prototype model results, while modest, suggest there is a useful signal available on future performance classification in at least the biotechnology industry based on the target text from the 10-K.”

The future looks promising

Ryan says that “while the model needs to be improved with more samples, refinements of domain-specific vocabulary, and text augmentation, it suggests that providing this signal as another decision input for investment analyst would improve the efficiency of the firm’s analysis work.”

“Overall, this prototype validated additional investment by our partner in natural language based deep learning to improve efficiency, consistency, and effectiveness of human reviews of textual reports and information.”

So, the initial chance of stock performance prediction was at 33.33% before the project began, and that was raised to 62% accuracy through NLP, Deep Learning, Convolutional Neural Networking, and a host of developer tools in tow.

Tim Hinchliffe

The Sociable editor Tim Hinchliffe covers tech and society, with perspectives on public and private policies proposed by governments, unelected globalists, think tanks, big tech companies, defense departments, and intelligence agencies. Previously, Tim was a reporter for the Ghanaian Chronicle in West Africa and an editor at Colombia Reports in South America. These days, he is only responsible for articles he writes and publishes in his own name. [email protected]
VIEW ALL POSTS

< Next Post

Coding Autism: the startup empowering autistic adults in the tech industry

Previous Post >

VR is shifting how we experience anxiety, surgery, pain, and games

Technology

A Look Into AI and the Risks to Elections

Image via: Freepik When an entire nation devotes its attention to Vice President Kamala Harris...

September 11, 2024 Emily Singleton

Technology

Mars Petcare and Michelson Found Animals want to make life easier for pet owners with Leap Venture Studio startup accelerator

Being a pet owner can be hard. For one, according to researchers from the University of...

September 7, 2024 Sociable Team

Technology

Transformers: Navigating the New Era of Attention in Technology

In 2017, a group of researchers (from Google and the University of Toronto) introduced a new way to...

August 30, 2024 HackerNoon

Sociable's Podcast

Brains Byte Back

Brains Byte Back interviews startups, entrepreneurs, and industry leaders that tap into how our brains work. We explore how knowledge & technology intersect to build a better, more sustainable future for humanity. If you’re interested in ideas that push the needle, and future-proofing yourself for the new information age, join us every Friday. Brains Byte Back guests include founders, CEOs, and other influential individuals making a big difference in society, with past guest speakers such as New York Times journalists, MIT Professors, and C-suite executives of Fortune 500 companies.

In this episode of Brains Byte Back, we’re proud to kick off Latin Heritage Month and recognize the achievements of those in the community. Today, we’re joined by Gerardo Sandoval, CEO and founder of Facil Cloud. Gerardo’s inspiring journey takes us from his early days in Venezuela, where he launched his first tech startup at just 16— a computer lab where recognized the value of empowering his community with computer skills. A small venture that gave him the skills leading up to his present-day success in creating a Miami-based private cloud solutions company.

He shares a personal story about a family vacation he never returned home from, which obliged him to leave a successful web hosting company behind. This story reflects the struggles of many Venezuelans forced to leave their homes, not by choice. However, that didn’t deter him from being a driving force in both the Latino and tech communities.

To date, Gerardo has founded several startups with clients in more than 20 countries and is recognized as one of the main Hispanic influencers in the world of Growth Hacking. He shares how he uses this combination of data science, digital marketing and other elements to create bridges between what companies want and what customers desire.

Find out more about Gerardo Sandoval here (Linkedin) –

https://www.linkedin.com/in/gerardosandovalcabrera/

Find out more about Facil Cloud (website) –

https://www.facilcloud.com/

Brains Byte Back:

Reach out to today’s host, Erick Espinosa (Linkedin) –

linkedin.com/in/erick-espinosa

Get the latest on tech news – https://sociable.co/

Leave an iTunes review – https://rb.gy/ampk26

Find out more about our sponsor Publicize –https://publicize.co/startup-resources/