World Data Lab Releases a publicly available COICOP Labeller

World Data Lab Releases a publicly available COICOP Labeller

Author
Author

National household consumption patterns are tracked through surveys administered by national institutions. These surveys encompass product categories that are defined by each country, and occasionally, by smaller regional areas. Comprehensive household consumption pattern research plans run into problems when attempting to standardize consumption categories across countries and regions. The existing UN 2018 standard for Classification of Individual Consumption According to Purpose (COICOP) offers a system for labeling product categories. However, hand-labeling is cumbersome and prone to error.

The World Data Lab COICOP Labeller offers a quick and easy solution to household product survey labeling. This algorithm will categorize a set of product names into the appropriate COICOP classification. This algorithm can be used for a variety of purposes including ingesting country household expenditure surveys, company production line documents, country CPI weights and more.

World Data Pro uses the COICOP Labeller to ingest survey data from over 40 countries to forecast key economic indicators. Product survey data used in World Data Pro needs to be categorized in order to accurately generate economic trend predictions. Correctly categorizing these products into the existing UN COICOP Standard is an imperative to avoid dramatic overestimates or underestimates of household spending on products.

The algorithm uses the Open AI API to categorize a set of product names into the COICOP labeling standard. For WDL, large generalized language models have proven to have better accuracies than existing categorization methods such as Support Vector Machines or Random Forests.

The algorithm takes a set of product names in CSV format. The algorithm labels these products by sending the products in chunks to the Open AI API and ingests the output into a CSV format. The algorithm offers language universality, can label at high capacity, quickly and cheaply (subject to the Open AI API pricing), and a high degree of accuracy. The accuracy of the model is greater than 99% for level 1, based on a sub-sample of WDL's existing labeled products.

Recent Blog Posts

Recent Blog Posts