Data Collection Methods: Sourcing Startup Performance

By juditova Management Startup Accelerators 2025-04-12

Home » Blog » Data Collection Methods: Sourcing Startup Performance

The effectiveness of this study depends largely on the quality and relevance of its data. Since the goal of this research is to evaluate whether corporate accelerators like Google and Microsoft act as springboards or sand traps for startups, the data must reflect measurable post-acceleration outcomes or results. This section outlines how the database was compiled, what variables were included, and why the data source was selected.

This post represents a series of articles related to a research and dissertation called “Are corporate accelerators springboards for startups: a performance analysis of the Microsoft’s and Google’s accelerated.

Table of Contents

SOURCE OF DATA: Crunchbase

All the data used in this study was collected exclusively from Crunchbase, a globally recognised platform that aggregates information on startups, funding rounds, acquisitions, IPOs, and company milestones. The database was manually filtered and compiled into a spreadsheet, which serves as the foundation for all subsequent statistical analysis.

No other data sources were used. Therefore, all interpretations, results, and conclusions derived from this dissertation are grounded in data that was originally published and maintained on Crunchbase.

DATA FILTERING AND INCLUSION CRITERIA

The database was filtered using the following criteria:

Only startups that were explicitly identified as participants in either the Google or Microsoft startup accelerator programs were included.
Startups with missing or ambiguous data for performance metrics (e.g., unknown IPO status or founding year) were excluded.
The time horizon focuses primarily on the first three years post-acceleration, allowing for short- to mid-term outcome analysis (e.g., survival, funding trajectory).

This careful filtering ensures that the database reflects only those companies that were actively shaped by these two corporate accelerators.

DATA FORMAT

The database was structured and standardised to facilitate clean, interpretable analysis using ANOVA models and other miscellaneous binary indicators. Funding amounts were normalised in millions of U.S. dollars to ensure consistency across geographic contexts. Categorical variables—such as accelerator participation, survival status, and IPO status—were encoded as binary indicators (0 or 1), making them statistically testable within common modelling frameworks.

The scope of the database focuses exclusively on the impact of participation in either Google or Microsoft startup accelerator programs. It does not include startup-to-startup comparisons within each accelerator. Instead, it captures and compares overall performance trends of the two programs based on the startups they accelerated.

IPO outcomes and closure status were observed only once per startup within the recorded timeframe, generally corresponding to the first three years post-acceleration. This short-to-mid-term focus enables the evaluation of early program impact without conflating it with longer-term business dynamics.

Finally, Crunchbase was chosen as the sole data source due to its global industry reputation and its frequent use in peer-reviewed entrepreneurship research (Seitz et al., 2023). Its platform aggregates and verifies data on startup funding, exits, and milestones, making it a credible foundation for structured, empirical investigation.

DESCRIPTION OF Main VARIABLES

The database includes many variables but here are enlisted those most important for the dissertation topic, not necessarily in order:

Column Name	Description
Company name	The name of the startup
Is United States?	Binary variable: If the country where the startup is headquartered is within the United States of America
Accelerator Name	It indicates the names of the accelerator that have accelerated the startups.
SIC Code	Standard Industry Classification Code, identifies the industry of the startup (e.g., AI, Healthtech).
Funds raised at the Acceleration Date	The funds raised by the startup before the acceleration date, expressed in millions.
Amount of funding, Accelerator at the Acceleration date	The amount of investments made by the accelerator at the acceleration date, expressed in millions.
Amount of Funding	The funds raised by the startup one year, two years or three years (depending of the period of time) after the acceleration date, expressed in millions.
Founded Date	The startup’s founding date.
IPO	It contains 1 if the startup had an IPO one year, two years or three years (depending of the period of time) after the acceleration date, 0 otherwise.
Closed	It contains 1 if the startup closed one year, two years or three years (depending of the period of time) after the acceleration date, 0 otherwise.
Acceleration Date	The date the startup was accelerated.
Patents at the Accelerator Date	It contains 1 if the startup owned patents before the acceleration date, 0 otherwise.
Number of Patents at the Acceleration Date	The number of patents that the startup owned before the acceleration date.
Acquired	It contains 1 if the startup was acquired one year, two years or three years (depending of the period of time) after the acceleration date, 0 otherwise.
Acquisition Made	It contains 1 if the startup made acquisitions one year, two years or three years (depending of the period of time) after the acceleration date, 0 otherwise.

These variables were selected because they provide measurable signals of startup progress and can be statistically analysed through the presented methods (ANOVA and binary statistics).

DATA VALIDATION, CLEANING and LIMITATIONS

To ensure analytical reliability all missing values were either confirmed as null or encoded with zero (for binary and monetary values); patents, acquisitions, and closures were cross-checked against timestamps to ensure accuracy and duplicates along with inconsistencies were removed.

Although Crunchbase is a reputable platform and is well-suited for cross-sectional performance comparisons in accelerator research, it has limitations:

Data completeness may vary across regions or industries.
Not all startups disclose full funding or outcome data.
Manual collection may introduce transcription error, although all efforts were made to verify consistency.

CONCLUSION

The database provides a structured, complete, and purpose-fit snapshot of startup performance following acceleration from Google and Microsoft. It includes all relevant business metrics for conducting ANOVA-based analysis and aligns with the research design introduced in Section 3.1.

References

Crunchbase. (n.d.). Crunchbase: Discover innovative companies and the people behind them. Retrieved from https://www.crunchbase.com/