RAVENPACK JOB ANALYTICS – content guide.txt

Folder:
SAFE_DATAROOM\07_ravenpack_job_analytics\

Goal of this file
-----------------
This note gives a very simple overview of:
1. What the RavenPack Job Analytics data are,
2. What is stored in each subfolder, and
3. How researchers can practically start working with the data.

Please read this before opening any of the large files.


1. What is RavenPack Job Analytics?
-----------------------------------
RavenPack Job Analytics is a dataset of ONLINE JOB POSTINGS collected from the
LinkUp platform and processed by RavenPack.

Each observation is a job posting and typically contains:
- a job / posting identifier,
- employer (company) name and ID,
- job title / position,
- occupation information,
- country / location,
- posting date (and sometimes update / expiration dates),
- various flags and metadata.

At SAFE we store:
- yearly raw data files from 1997 to 2024,
- mapping tables to link RavenPack companies to sectors, positions and
  financial identifiers (e.g. CRSP permco),
- U.S. Bureau of Labor Statistics (BLS) OEWS occupation–wage tables,
- O*NET occupation classifications.

The data are suitable for projects on labour demand, vacancies, skills, and
firm-level hiring. The files are large and MUST be handled with a statistical
package (R, Python, Stata, etc.), not Excel.


2. Folder structure (overview)
------------------------------
Inside 07_ravenpack_job_analytics you will find four main folders:

1) Raw_Data
2) Mapping_Lists_RavenPack
3) Occupational Employment and Wage Statistics (OEWS) Tables
4) ONET OnLine

Each of them is described below.


2.1 Raw_Data
------------
Path:
SAFE_DATAROOM\07_ravenpack_job_analytics\Raw_Data\

Content:
- One ZIP file per year:
  RavenPackEdge_LINKUP_1997.zip
  RavenPackEdge_LINKUP_1998.zip
  ...
  RavenPackEdge_LINKUP_2024.zip

Each ZIP contains one or more delimited text files (CSV or similar) with
job-posting level data for that calendar year.

Important notes:
- These files can be VERY LARGE (millions of rows).
- Do NOT open them directly in Excel: it will be slow, may crash, and may cut
  off rows.
- Use a statistical software:
  - R (readr, data.table, etc.),
  - Python (pandas),
  - Stata (import delimited),
  or similar.
- If the file is too large for your machine, start by importing only selected
  columns or filter by country / year inside your software.

Typical first steps:
- Copy the ZIP file for the year you need to your own working folder
  (or work directly from the network if performance is acceptable).
- Unzip it.
- Import the main CSV into R/Python/Stata and inspect a small sample
  (e.g. head / first rows) to see the variables.


2.2 Mapping_Lists_RavenPack
---------------------------
Path:
SAFE_DATAROOM\07_ravenpack_job_analytics\Mapping_Lists_RavenPack\

Main files (names may include dates):
- company_YYYY-MM-DD.csv
- organization_YYYY-MM-DD.csv
- position_YYYY-MM-DD.csv
- sector_YYYY-MM-DD.csv
- source_YYYY-MM-DD.csv
- CRSP_RP_linking_table_permco_RP_id_….dta
- RavenPack_Mapping.parquet

Purpose:
These files provide the “dictionaries” and link tables for the raw postings.

Typical content:
- company_…:
  maps RavenPack company IDs to company names, country, sometimes ticker, etc.
- organization_…:
  additional information at organization level (if different from company).
- position_…:
  list of positions / job titles with a standardized position ID.
- sector_…:
  sector / industry classification used by RavenPack.
- source_…:
  information on which job board / website the posting comes from.
- CRSP_RP_linking_table_permco_RP_id_….dta:
  link between RavenPack company IDs and CRSP PERMCOs (for joining with CRSP /
  Compustat data in other SAFE folders).
- RavenPack_Mapping.parquet:
  an efficient version of the mappings (mainly for users who already know how
  to work with Parquet files).

Typical workflow:
1. Import one year of Raw_Data (job postings) into your software.
2. Import the relevant mapping file (e.g. company_….csv).
3. Merge the posting file and the mapping file by the RavenPack company ID.
4. If you need to link to financial data (CRSP / Compustat), also merge in the
   CRSP_RP_linking_table to obtain the PERMCO.


2.3 Occupational Employment and Wage Statistics (OEWS) Tables
-------------------------------------------------------------
Path:
SAFE_DATAROOM\07_ravenpack_job_analytics\Occupational Employment and Wage Statistics (OEWS) Tables\

Structure:
- Subfolders by year: 1997, 1998, …, 2024
- Documentation PDF:
  “Occupational Employment and Wage Statistics (OEWS) Tables – U.S. Bureau of
   Labor Statistics.pdf”

Content:
- Official OEWS tables from the U.S. Bureau of Labor Statistics.
- For each year: employment and wage information by occupation (SOC codes),
  sometimes by region or industry.

Use:
- Combine with RavenPack data if you want to:
  - benchmark wages for occupations,
  - construct occupation-level employment or wage indices,
  - relate vacancy postings to official occupation statistics.

Researchers will typically:
- Choose the year that matches their RavenPack sample,
- Import the relevant OEWS table into their software,
- Match by occupation code (after mapping RavenPack occupation codes to SOC).


2.4 ONET OnLine
---------------
Path:
SAFE_DATAROOM\07_ravenpack_job_analytics\ONET OnLine\

Main files:
- All_Job_Families.xlsx
- All_Occupations.xlsx
- O_NET OnLine Help_Job Zones.pdf
- Folder: OccupationalListings\ (with additional detailed tables)

Content:
- O*NET occupational classification system and related information
  (job families, detailed occupations, job zones / skill requirements, etc.).

Use:
- To classify job postings by skills, required education, job family, job
  zone, etc.
- Typical approach: map RavenPack occupation information to O*NET categories
  and then use the O*NET attributes in the analysis.


3. How researchers should start (step-by-step)
------------------------------------------
1. Clarify your research question.
   - Example: “I want a firm-level measure of hiring intensity for U.S. firms
     between 2010 and 2015.”

2. Identify the relevant years of Raw_Data.
   - For the example above: 2010–2015.

3. Work on one year first.
   - Unzip one year (e.g. 2015) from Raw_Data.
   - Import the main job-posting file into your software.
   - Inspect the variables and keep only the columns you need (e.g. company ID,
     date, location, occupation, etc.).

4. Add company information.
   - Import `company_….csv` from Mapping_Lists_RavenPack.
   - Merge with the postings using the RavenPack company ID.
   - If you need links to CRSP, also merge in the CRSP_RP_linking_table to get
     PERMCO.

5. (Optional) Add occupation / wage information.
   - Map occupation codes to OEWS / O*NET if your project needs this.
   - Import the relevant OEWS year or O*NET file and merge by occupation code.

6. Repeat for additional years.
   - Once the process works for one year, repeat it for other years and append
     the yearly datasets.

7. Be careful with file sizes.
   - Always test your code on a small subset first (e.g. a single month or a
     small sample of rows).
   - Save intermediate results in your own folder to avoid re-importing the
     raw files every time.


4. Software and practical tips
------------------------------
- Recommended: R, Python, or Stata on the SAFE PCs.
- Do NOT use Excel for full files: it is fine only for small samples.
- Always work from the network path or copy only the years you need to your
  own working folder.
- Document your steps (which files and years you used, which merges you did)
  so that your work can be reproduced.


5. Questions and support
------------------------
- For technical access problems (permissions, missing folders, file corruption)
  please contact: SAFE Data Center (datacenter@safe-frankfurt.de).
- For questions about research design, variable definitions, or modelling,
  researchers should consult their supervisors.