Scroll Top
19th Ave New York, NY 95822, USA

Data Collection Methods: Sourcing Startup Performance

Home » Blog » Data Collection Methods: Sourcing Startup Performance

The effectiveness of this study depends largely on the quality and relevance of its data. Since the goal of this research is to evaluate whether corporate accelerators like Google and Microsoft act as springboards or sand traps for startups, the data must reflect measurable post-acceleration outcomes or results. This section outlines how the dataset was compiled, what variables were included, and why the data source was selected.

This post represents a series of articles related to a research and dissertation called “Are corporate accelerators springboards for startups: a performance analysis of the Microsoft’s and Google’s accelerated.

SOURCE OF DATA: Crunchbase

All the data used in this study was collected exclusively from Crunchbase, a globally recognised platform that aggregates information on startups, funding rounds, acquisitions, IPOs, and company milestones. The database was manually filtered and compiled into a spreadsheet, which serves as the foundation for all subsequent statistical analysis.

No other data sources were used. Therefore, all interpretations, results, and conclusions derived from this dissertation are grounded in data that was originally published and maintained on Crunchbase.

DATA FILTERING AND INCLUSION CRITERIA

The dataset was filtered using the following criteria:

  • Only startups that were explicitly identified as participants in either the Google or Microsoft startup accelerator programs were included.
  • Startups with missing or ambiguous data for performance metrics (e.g., unknown IPO status or founding year) were excluded.
  • The time horizon focuses primarily on the first three years post-acceleration, allowing for short- to mid-term outcome analysis (e.g., survival, funding trajectory).

This careful filtering ensures that the dataset reflects only those companies that were actively shaped by these two corporate accelerators.

DATA FORMAT AND STUDY SCOPE

The dataset was structured and standardised to facilitate clean, interpretable analysis using linear regression and ANOVA models. Funding amounts were normalised in U.S. dollars to ensure consistency across geographic contexts. Categorical variables—such as accelerator participation, survival status, and IPO status—were encoded as binary indicators (0 or 1), making them statistically testable within common modelling frameworks.

The scope of the dataset focuses exclusively on the impact of participation in either Google or Microsoft startup accelerator programs. It does not include startup-to-startup comparisons within each accelerator. Instead, it captures and compares overall performance trends of the two programs based on the startups they accelerated.

IPO outcomes and closure status were observed only once per startup within the recorded timeframe, generally corresponding to the first three years post-acceleration. This short-to-mid-term focus enables the evaluation of early program impact without conflating it with longer-term business dynamics.

Finally, Crunchbase was chosen as the sole data source due to its global industry reputation and its frequent use in peer-reviewed entrepreneurship research (Seitz et al., 2023). Its platform aggregates and verifies data on startup funding, exits, and milestones, making it a credible foundation for structured, empirical investigation.

DESCRIPTION OF Main VARIABLES

The database includes many variables but here are enlisted those most important for the dissertation topic, not necessarily in order:

Column NameDescription
Company NameThe name of the startup
Is United States?Binary variable: If the country where the startup is headquartered is within the United States of America
CityCity of headquarters
SIC CodeInformation representing startup’s sector focus (e.g., AI, Healthtech)
Total Amount FundedThe final sum of total funds received
Last Funding AmountThe last amount of funding raised, in USD
Last Funding DateThe date of the last funding round
Founded DateYear the startup was founded
Number of Funding RoundsTotal number of distinct fundraising rounds completed
IPOWhether the startup went public (1 = Yes, 0 = No)
IPO TimingYears taken to IPO after acceleration (1, 2, 3, or None)
ClosedIndicates whether the startup closed down (1 = Closed, 0 = Still Active)
Survived 3 YearsBinary variable: 1 if startup survived at least 3 years post-acceleration
Is Google? / Is Microsoft?Binary variable: 1 if startup was in Google’s or Microsoft's accelerator; 0 otherwise, respectively

These variables were selected because they provide measurable signals of startup progress and can be statistically analyzed through the methods presented in Section 2.7.

VALIDITY AND LIMITATIONS

Although Crunchbase is a reputable platform and is well-suited for cross-sectional performance comparisons in accelerator research, it has limitations:

  • Data completeness may vary across regions or industries.
  • Not all startups disclose full funding or outcome data.
  • Manual collection may introduce transcription error, although all efforts were made to verify consistency.

CONCLUSION

The dataset provides a structured, complete, and purpose-fit snapshot of startup performance following acceleration from Google and Microsoft. It includes all relevant business metrics for conducting regression and ANOVA-based analysis and aligns with the research design introduced in Section 3.1.

References

  1. Crunchbase. (n.d.). Crunchbase: Discover innovative companies and the people behind them. Retrieved from https://www.crunchbase.com/

Related Posts