Want an edge in your fantasy football league? New Harvard data kit...

Want an edge in your fantasy football league? New Harvard data kit may help


It seems fantasy football fanatics may soon have a new way of analyzing large chucks of data, according to an announcement from a team of scientists at Harvard.

A team of researchers from Harvard University and the Broad Institute, working in coordination with the National Science Foundation, have developed a tool that can analyze and detect patterns in large data sets in a way that no other software currently available is able to do.

The project, funded in part by the National Science Foundation, is said to be able to tease out multiple recurring events or sets of data hidden in health information from around the globe.

“The goal of this statistic is to take data with a lot of different dimensions and many possible correlations and pick out the top ones,” said Michael Mitzenmacher, a senior author of the paper and professor of computer science at Harvard University. “We view this as an exploration tool ‘ it can find patterns and rank them in an equitable way.”

Researchers noted that the kit would add to research attempts to track hurricanes, efforts to model earthquakes, endeavors to identify the Higgs Boson and efforts to glean insights from affecting the world economy and social networking interaction. The team also noted that individuals could use the kit to analyze sports statistics, including those used for Fantasy Football and betting.

As of now, scientists use advanced technology to gather big, complex, data sets, which may be incredibly useful in enhancing system understanding. Sophisticated computer programs research these data sets with great speed, but fall short in even-handedly detecting different kinds of patterns in large data collections, which researchers noted is essential for more sophisticated analysis. Harvard scientists said the kit would allow for analyzing large data packets in an more efficient and effective way, possibly revealing trends that were previously unknown.

The team noted that while sophisticated computer programs research current data sets with great speed, one of the greatest strengths of this newly discovered tool within is its ability to detect and analyze a broad spectrum of patterns and characterize them according to a number of different parameters a researcher might be interested in. The new data mining tool can not only sort vast data sets to find patterns, it can also rank multiple patterns within the data, which the team said would likely assist in a various fields of science.

The tool can’t answer the question of whether one thing caused another, but by finding the strongest correlations, it can help scientists generate new hypotheses and questions to explore, said the team of scientists.

Speaking Friday, the research team explained that the performance of the kit was tested through the use of a very large data set.

“There are massive data sets that we want to explore, and within them, there may be many relationships that we want to understand,” said Broad Institute associate member Pardis Sabeti, senior author of the paper and an assistant professor at the Center for Systems Biology at Harvard University. “The human eye is the best way to find these relationships, but these data sets are so vast that we can’t do that. This toolkit gives us a way of mining the data to look for relationships.”

In addition, to demonstrate the power of their technique the researchers also applied it to a diverse range of problems. In one case they looked at factors that influence people’s health globally, using data collected by the World Health Organization in Geneva, Switzerland. The team also decided to try their tool out on statistics and salaries from Major League Baseball. They found that hits, total bases, and a statistical measure of offensive performance were most strongly correlated with salary.

Other authors who contributed to this work include Hilary Finucane, Sharon Grossman, Gilean McVean, and Eric Lander. Funding for this work was provided by the Packard Foundation, Marshall Aid Commemoration Commission, National Science Foundation, European Research Council, and the National Institutes of Health.

The researchers report their findings in a paper appearing in the December 16 issue of the journal Science.