Perspective of Researchers on Big Data
Prof. Dr. Frauke Kreuter (IAB)
Professor Kreuter’s presentation gave an overview of the benefits and challenges of the use of big data from a social science researcher’s perspective. She underscored the significance of the big data phenomenon by citing a number of international expert reports of science organisations on the subject, including by the Committee on National Statistics (CNSTAT), the American federal statistics agency, as well as her own experiences as a researcher at the Institute for Employment Research (IAB). She rates the scientific value of big data’s temporal and spatial granularity very highly. It enables researchers to ask new research questions and to better examine subgroups of the population. With response rates to traditional social science surveys being at an all-time low, the advent of big data might provide statistics agencies and other large data collectors with a cost-efficient way to gather data to satisfy the ever-increasing demand for evidence-based research. Concerns over the academic use of big data include problems of measurement and inference. While social survey data is a product of meticulously and expertly designed surveys, Professor Kreuter describes big data as “organic data” in the sense that it is not originally produced for research purposes. From a privacy and ethics perspective, big data challenges traditional notions of informed consent. Professor Kreuter recommends collaborative research programs to investigate options for combining public sector data with private sources without breaching privacy as well as investigate the quality of sources and data collection processes, which will require cooperation in interdisciplinary teams. In order to create such research teams, the scientific community needs to build capacity among employees and researchers both in the public and private sector.
ABIDA - Assessing Big Data
Prof. Dr. Johannes Weyer (TU Dortmund)
Prof. Dr. Weyer’s presentation first gave an overview of the interdisciplinary research project “ABIDA - Assessing Big Data”, which was established by the German Ministry for Education and Research in 2014. Second, as head of the project’s sociological working group, Weyer presented the group’s soon-to-be published expert report, which examines the big data phenomenon from a sociological perspective. Big Data is created by the now-ubiquitous data generation by smart devices and automatic processes (“datafication”), which is giving way to new methods of data analysis, including data mining, reality mining, machine learning, and statistical relational learning, much of it in real time. The granularity of big data allows for data analytics on a macro level (big picture, forecasting) and also on the micro level by creating individual profiles (marketing) or detect anomalies (counterterrorism). On a micro level, Weyer uses the example of the “quantified self” and practices of self-measurement, especially health monitoring, to exemplify where the new wealth of big data is coming from. The use of health monitoring apps is so popular because it generates meaning, can be used to optimise health, body and routines, emancipates people from medical staff and other experts, and simply creates enjoyment (“gamification”). The data from these apps are shared voluntarily with peers and data analysts, because users expect tangible benefits which outweigh the disadvantages.
On the Epistemology, Ethics and Politics of Big Data Practices
Dr. Judith Simon (IT University Copenhagen, University of Vienna)
Dr. Simon’s presentation looked at big data practices from a philosophical perspective, highlighting the changes in epistemology, ethics and politics that big data practices have brought about, while giving an overview of the political and academic big data debate as a whole. Big data’s granularity creates problems regarding privacy and autonomy and renders traditional notions of privacy obsolete. Aggregation, data linkage and the processing of seemingly anonymous data can easily lead to person-related data (cf. “the Target example”). Moreover, location data combined with time stamps and maps make possible enormous inferences about individuals. Depending on who collects the data for what reason, this can be highly consequential. From a political perspective, big data has created new asymmetries of power by dividing the population into providers of data, on the one hand, and collectors or owners of data as well as data analysts on the other. Apart from being a technological and scholarly phenomenon, big data has generated its own mythology, implying that large data sets contain a higher form of truth and knowledge. In academia, this has given way to an academic debate on the so-called “New Empiricism”, the strongly contested notion that big data speaks for itself and has the power to rid social research from social theory. This heralding of a new era of science clashes with research realities, which are characterised by a high dependence on ontologies, standards and regulatory frameworks, a highly labor-intensive, underappreciated data preparation process, and problems of access as well as a lack of computational competence.
The Consumer Data Research Centre
Prof. Mark Birkin (University of Leeds)
Professor Birkin of the University of Leeds presented the activities of the Consumer Data Research Centre (CDRC) and the wider UK big data research landscape. The CDRC is part of the Economic and Social Research Council’s data network, which further includes the Administrative Data Research Network, the Urban Big Data Centre, and the ESRC Business and Local Government Data Research Centre. A social media data and third sector data initiative is in process. The CDRC gains access to consumer data, including loyalty card data, credit card data, mobile phone data, etc. by way of data-sharing partnerships with private-sector companies. Those companies share their data in collaborative projects with the CDRC, which enables them to use the CDRD’s expertise and skills, data infrastructure, and data linkage opportunities. Prof. Birkin presented numerous examples for the research application of big data conducted by the CDRC, for example, urban mobility research using a route-planning app, activity profiles using mobile phone and loyalty card data, health research linked with geographical data to better understand the effect of environmental factors on obesity.
Working with Big & Complex Data: Experiences and Perspectives
Prof. Maria Fasli (University of Essex)
Professor Fasli gave an overview of the Institute for Analytics and Data Science (IADS) at the University of Essex and the wider UK big data research landscape. The IADS receives large amounts of funding from the UK government in order to push forward interdisciplinary projects on all aspects of data and analytics, which include exploring technologies for transfer and management, analysis and modelling techniques and methods, socio-economic aspects of data, evidence-based policy, and ethical and legal requirements. She presented examples of how big data can close data gaps in understanding complex system behaviour on the macro-level and better understanding individual behaviour on the micro-level, for example, using consumer data. Professor Fasli introduced numerous key players in the academic UK data landscape, for example, UK Data Archive (UKDA), the ESRC Business and Local Government Data Research Centre, and the Human Rights, Big Data and Technology Project (HRBDT). She also emphasised the importance and IADS’s focus on training in order to safeguard the next generation of data scientists and outreach activities to keep the public involved.
Big Data in Social and Economic Research
Prof. Dr. Thomas Bauer (RWI - Leibniz Institute for Economic Research)
In his presentation, Professor Bauer gave an individual account of his research involving big data sources at the RWI - Leibniz Institute for Economic Research. In order to being able to answer specific research questions, his institute has worked to gain access to big data sources by cooperating closely with the private-sector companies that produce them, including Google and ImmobilienScout24, one of Germany’s largest online real-estate markets. Bauer used the data of the latter provider to create highly detailed georeferenced population information within economic research. He highlighted the significance of big data for health economics. Ways of access have included a) purchasing data b) offering incentives to private-sector companies to share their data with the RWI c) using web-scraping to extract data from websites. While Professor Bauer has successfully acquired and applied big data in economic research, he cautions that it also enables “data-hungry identification strategies”.
Big Data-related activities at Deutsche Bundesbank
Jürgen Häcker (Deutsche Bundesbank)
Jürgen Häcker presented the big data activities of Deutsche Bundesbank, Germany’s central bank. Moreover, he gave an overview of the limitations and ethical and legal challenges big data poses to a high-trust organisation such as a central bank. He concludes that big data is viewed as an effective tool for macroeconomic and financial stability analysis as well as enhancing existing statistics, but the actual involvement of central banks with big data is limited. The Deutsche Bundesbank’s big data activities rest on three pillars: the House of Micro Data, which concentrates digital information in one area, the Big Data Project, which aims at complementing existing data sources with tertiary data sources, and various IT projects, including Hadoop, which are powerful tools that use clusters of computers to perform analysis of huge amounts of data. One of Deutsche Bundesbank’s projects consists of analysing textual patterns for assessing the quality and comprehensiveness of bank’s risk reports.
Big Data in the ESS & Eurostat
Mr. Wirthmann presented the big-data-related activities of Eurostat, the European statistical office. In summary, the ESS is facing the data revolution and views big data as an important potential enhancement of statistics. Following the Scheveningen Memorandum, issued by the conference of general directors of European national statistical institutes (NSIs), the European Statistical System (ESS) drew up a big data roadmap and action plan, which led to the establishment of a numerous pilot projects 2016-2019 including on mobile communication data, supermarket cashier data, automatic vessel identification data, smart metre data, Wikipedia data and macroeconomic nowcasting. The expected benefits for NSIs include a better and more flexible response to user needs and the expansion of the range of statistical products and services. Regarding the way NSIs work, incorporating big data is expected to produce efficiency, acquire new competencies and ensure that NSIs remain key players for statistical information. Working with big data will require strict ethical review and guidelines and poses a number of challenges to statistical methodology. Eurostat is working on a concrete training strategy to the big data skills gap as part of the European Statistical Training Programme (ESTP) and is hosting numerous big-data-related events on a European level. The “data revolution”, the massive increase of data production and exchange, will likely alter the role of statistical institutes in the data ecosystem.
Towards a data driven society
Magchiel van Meeteren (Center for Big Data Statistics)
Van Meeteren presented the big data approach of the CBS Statistics Netherlands, which has established the Center for Big Data Statistics (CBDS). The CBDS is using big data and administrative data sources to create new, real time statistics and improve existing statistics, while reducing the data collection footprint. In order to do this, CBDS has created an international partner network consisting of businesses, universities, government agencies, and national statistical institutes (NSIs), including Microsoft, the World Bank, University of Michigan, the UK Office for National Statistics and many others. The greatest challenges of big data to NSIs are the development of skills and privacy concerns. Van Meeteren emphasised the importance of public perception and trustworthiness for a “high trust organisation” such as NSIs and recommends a pro-active approach to engage with the public, although NSIs do not usually require the population’s informed consent.