Use our APIs to enrich a HR data lake (S3/Spark/SQL)
This guide provides step-by-step instructions on using Java/Spark to fetch the data and land it in S3 for later ingestion into your AWS data lake, such as Athena.
If you have an HR data lake stored in an S3 bucket and want to enrich it with emerging skills data for different occupations, our APIs provide a convenient way to fetch this information and land it in your data lake. In this guide, we'll demonstrate how to use Java/Spark to fetch occupation's emerging skills data using our APIs and store it in S3 for later ingestion into your AWS data lake, such as Athena.
Prerequisites
Before we get started, make sure you have the following:
- Access to our APIs, including the required API key or access token.
- Java/Spark environment set up and configured for S3 access.
Fetching Emerging Skills Data
To fetch emerging skills data for a specific occupation and country, you can use the following Java/Spark code:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.SaveMode;
public class SkillsDataEnrichment {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder()
.appName("SkillsDataEnrichment")
.getOrCreate();
// Define the API endpoint
String apiUrl = "https://example.com/di/v1/occupations/{occupation_id}/skills/emerging";
// Set the occupation ID and country code
String occupationId = "YOUR_OCCUPATION_ID";
String countryCode = "US";
// Fetch the emerging skills data from the API
Dataset<Row> skillsData = spark.read().format("json")
.option("inferSchema", "true")
.load(apiUrl.replace("{occupation_id}", occupationId) + "?country_code=" + countryCode);
// Save the skills data to S3
String s3OutputPath = "s3://YOUR_BUCKET_NAME/path/to/skills_data";
skillsData.write().mode(SaveMode.Overwrite).parquet(s3OutputPath);
spark.stop();
}
}
In this example, we use the SparkSession API to read the emerging skills data from the API endpoint /di/v1/occupations/{occupation_id}/skills/emerging
. We replace {occupation_id}
with the specific occupation ID you want to fetch data for. We also specify the country code (e.g., "US") to retrieve skills data specific to that country.
Next, we save the fetched skills data to S3 using the parquet
file format. Ensure you replace "s3://YOUR_BUCKET_NAME/path/to/skills_data"
with the appropriate S3 bucket and file path in your AWS environment.
Ingesting into AWS Data Lake
Once the skills data is stored in S3, you can then use AWS services such as Athena to ingest and analyze the data within your data lake. You can write SQL queries in Athena to gain insights from the enriched HR data, including the emerging skills information.
Here's a generic example of how you can query the data using SQL in Athena:
-- Create an external table in Athena pointing to the skills data in S3
CREATE EXTERNAL TABLE IF NOT EXISTS skills_data (
id STRING,
name STRING,
description STRING
)
STORED AS PARQUET
LOCATION 's3://YOUR_BUCKET_NAME/path/to/skills_data';
-- Query the skills data
SELECT *
FROM skills_data
WHERE occupation_id = 'YOUR_OCCUPATION_ID';
Remember to replace 'YOUR_BUCKET_NAME/path/to/skills_data'
with the actual S3 bucket and file path where you stored the skills data, and 'YOUR_OCCUPATION_ID'
with the occupation ID you fetched the data for.
Remember to refer to our API documentation for additional details on request parameters, authentication, and response handling. If you encounter any issues or need further assistance, feel free to reach out to our support team. Happy data enrichment!
Updated over 1 year ago