Big Data Econometrics: Nowcasting and Early Estimates
There has been an increasing interest in so-called big data, i.e. a vast amount of information characterized by the 3Vs: very large volume, velocity and variety. Data sources for big data include:
- Business systems that record and monitor events of interest, such as registering a customer, manufacturing a product, taking an order etc. These data collected by either private businesses or public institutions is highly structured and includes transactions, reference tables and relationships as well as metadata that sets its context;
- Social networks (human-sourced information) referring to data typically loosely structured and often ungoverned, including those saved in proper social networks, in blogs or comments, in specialized websites for pictures, videos or internet searches, but also text messages, user-generated maps, e-mails etc.;
- Internet of Things, i.e. machine-generated data derived from sensors and machines used to measure and record events and situations in the physical world, is becoming an increasingly important component of information stored and processed by many businesses.
While initially big data has been used mainly in the private sector, big data also represents an opportunity in other fields, possibly combined with more traditional data sources. In particular, official statistics could also benefit from big data, as e.g. indicated by the High-Level Group for the Modernisation of Statistical Production and Services (HLG) or the Eurostat Task Force on Big Data.
Nowcasting and the construction of early estimates are concerned with the production of a preliminary estimate for the contemporaneous value of an indicator which has not yet been officially released. Leading examples are the Gross Domestic Product (GDP) and its components, deflators, and fiscal variables, which are typically released at least 30-45 days after the end of the reference month or quarter, and later revised. Nowcasts of monthly variables such as the Harmonised indices of consumer price (HICP) or confidence, sales, trade and labour market indicators could also be of interest.
This project focused on the use of big data for macroeconomic nowcasting and the production of early estimates, by surveying, developing and applying proper data handling techniques combined with state of the art econometric methods. Big data have substantial potential in this context, as timely/continuous/large sets of data should provide new or complementary information with respect to standard economic indicators.
Prior to considering an actual use of some big data econometrics techniques for nowcasting and early estimates purposes, a further review is necessary. This project contributed to this area and conducted a study for a given set of indicators selected by the Eurostat Unit B1 Methodology and Corporate Architecture in cooperation with Eurostat Unit C1 “National accounts methodology, Sector accounts, Financial indicators” and with the Eurostat Task Force on Big Data.
The aim was to present ways big data can be used in macroeconomic nowcasting to improve the quality of the early estimates, increase the timeliness of the releases, and complement the standard information with uncertainty and directional measures.
- Development of a typology of big data characteristics relevant for macroeconomic nowcasting and early estimates;
- Review of methods for feature extraction of big data sources to usable time series for econometric modelling;
- Review of filtering techniques for high frequency data (e.g.: signal extraction/decomposition of time series extracted from high frequency big data);
- Analysis of the most recent modelling techniques for big data with particular attention to Bayesian ones;
- Evaluation of nowcasting/flash estimation based on a big set of indicators;
- Propose new metrics for Official Statistics using econometrics techniques: estimation of the density, selection of variables, switching regime, partial derivative;
- Enhanced recommendations on step-by step procedure and approach to follow for the practical use of big data econometric methods;
- Big data handling tool developed as R package including supporting user documentation for deployment;
- Scientific summary.
- Technical Report on the main features of various big data sources including their typology for econometric purposes;
- Technical Report proposing big data conversion techniques including their main features and characteristics; possible use of the proposed techniques in relation to various big data typologies; empirical application of the proposed methods and their comparative analysis. This report also addressed the possibility of using data mining techniques for the conversion of unstructured data to time series;
- Technical Report presenting relevant methods for signal extraction and decomposition techniques for high frequency time series and outlier detection as well as the results of the empirical comparison of different methods for filtering of high frequency data;
- Comprehensive technical report including all relevant modelling techniques for big data. This included random trees, random forests, cluster analysis, deep learning and neural networks;
- Technical Report addressing modelling strategies for nowcasting/early estimates purposes; empirical test on possible timeliness gains when using Google Trends, other easily accessible big data and macroeconomic and financial variables; and quasi real-time simulations on specific Euro area and EU data including daily and weekly updated GDP growth estimate;
- Technical Report with indication of proposed new metrics for official statistics based on econometrics techniques. This also included density forecasting and related evaluation measures, directional forecasting as well as an empirical nowcasting exercise for key economic variables of four European economies;
- Technical Report on an enhanced step by step approach for the use of big data for nowcasting and early estimate exercises;
- R package on big data econometric handling with related documentation
- Abstract and scientific paper in publishable format and accompanying slide presentation.