- Interesting Data Gigs by Marcos Ortiz
- Posts
- Interesting Data Gigs # 3: Security Data Engineer at Reddit
Interesting Data Gigs # 3: Security Data Engineer at Reddit
Introducing the Interesting Data Gigs Talent Network, more jobs from Airbyte and Adyen, and why you must follow the work of Andreas Kretz and Mehdi Ouazza
🚨 Join the Interesting Data Gigs Talent Network 🚨
I wanted to share this amazing news with all of you.
I told you I was working very hard to create an amazing place for you to find outstanding Data Analytics roles.
That place is already created, and it’s called The Interesting Data Gigs Talent Network, where you can join today as a candidate.
Let’s change the game together: Instead of people applying to companies, companies will pitch to you, so don’t wait any other moment and join today.
Hey Data lovers, it’s Marcos again with a new edition of the Interesting Data Gigs newsletter.
This time, we will talk about a very interesting position open at Reddit, Inc. as a Security Data Engineer, so let’s discuss some ideas on how to be a better candidate for this role.
When I read the position, my favorite part was this one:
If you are passionate about building, security and data engineering, and making it easy for peers to adopt security best practices, we need you.
The ideal candidate will work to build a scalable Detection and Incident Response platform to detect security events and anomalies across Reddit’s technical ecosystem (endpoints, Kubernetes, and cloud).
At Reddit’s scale, we’ve got a lot of data, which means building and managing ingestion pipelines, processing rules, and data transformations, and deriving actionable intelligence and alerting from this data is key.
In addition, this position will assist in building and operating ETLs and alerting pipelines based on this data, integrating with security operations automation technologies, and help enrich detections that are passed to Security Operations engineers.
Why is this an interesting data gig?
Numbers, numbers, numbers.
Reddit is not called “the front page of the Internet” by chance. More than 430 Million users Monthly Active Users (MAUs) are on the platform today, with 50 Million Daily Active Users (DAUs), participating in more than 100k communities.
This means you will potentially work with a lot of data, in order of 100s of TBs
It’s a very challenging job because you will be analyzing everything related to Security at Reddit, and you will help fight the bad guys using Analytics
An important thing to know: Reddit is not profitable yet, but it’s growing very fast in revenue terms. According to an article from 2021 from The Information, they expected to generate $350 Million in revenue that year
And according to e-Marketer, the revenue numbers could be higher
Increasing users’ time spent on the platform and the number of subreddits that a user is exposed to makes Reddit’s recent investments in advertising even more attractive.
This year, we expect its ad revenue to keep rising strongly—by 38.9% to $423.8 million. Mobile ad revenues will grow even faster, at 43.3%.
What about the current valuation and funding of Reddit?
Well, Fidelity slashed Reddit’s valuation in a third, according to a recent report from Bloomberg:
Fidelity funds cut Reddit to $39.65 a share from $61.79, where they had been valued for the previous nine months. Stripe shares were reduced to $32.05 apiece, the lowest since last March.
The payments firm raised $600 million that month at a valuation of $95 billion, making it the most valuable US startup. Reddit was valued at more than $10 billion as of August.
Not great news for the investors, but great news for you as a prospective employee, because could give you more RSUs.
And about the funding: according to Crunchbase, Reddit has raised a total of $1,3 Billion until this date (June 7th, 2022), but again: Reddit is not profitable yet.
Let’s discuss some ideas on how to approach this job application (THE REAL MEAT)
According to the job description, you will be using some SIEMs like Splunk, Elastic, SumoLogic, and many others. So, you have to be prepared on how to send Splunk data to Amazon S3 for example, or how to use the ELK stack for security monitoring.
But more importantly, you need to be prepared about how you analyze this kind of data and provide real insights to the SecOps team at Reddit.
For example, this Data + AI Summit session from George Webster (Global Head of Cybersecurity Science and Analytics, HSBC), Monzy Merza (Vice President of Cybersecurity Go-to-Market, Databricks), and Jason Trost (Head of Analytic Engines, HSBC), brought to us by Databricks could give you some good tips on how to architect this modern platform for Cybersecurity analytics, using the help of Splunk, Apache Spark, and the Lakehouse architecture for it
The other two pillars to monitor here are the network, and the source code. In the case of the first, I can’t recommend enough to my good friends of Kentik, which has the perfect product for this, and in the case of the second one, I would recommend the amazing product that the GitGuardian team has created for this.
Kentik’s POCs: Doug Madory, Paul Sancimino and Anil Murty
GitGuardian’s POC: Jérémy Lanfranchi & Ziad Ghalleb
Here are some other projects that could help you with more ideas:
DevSecOps Quick Start (this one specifically is very good) by Mahdi Ebrahimi (AWS). Why? Because it integrates the pipeline with tools like Bandit (looking for vulnerabilities in Python code), Snyk (continuous monitoring for vulnerabilities in code’s dependencies), and CFN NAG (to look for patterns in CloudFormation templates that may indicate insecure infrastructure)
BigQuery Audit Log Anomaly Detection by Abdelfettah SGHIOUAR (Google) and Rachel Ng (Recursion)
Hashpipeline by Israel Herraiz (Google)
Use the built-in Amazon SageMaker Random Cut Forest algorithm for anomaly detection by Chris Swierczewski, Julio Delgado Mangas, Madhav Jha, and Luka Krajcar (AWS and Capital Group)
So, take all this, my friend, apply for it and chat with Scott Newman, who is the recruiter for this particular position at Reddit.
Some people on Reddit that could be your close colleagues
Jose Lobez (Global Head of Data Science & Analytics)
Jesus Serratos (Sr. Technical Recruiter)
Mike Doherty (Engineering Manager, Search)
Other featured jobs of the week
Data Engineer / Developer Advocate at Airbyte. Chat with Brian Cannon (Recruiting Lead at Airbyte) and Ari Bajo Rouvinen (Data Engineer and Technical Writer at Airbyte)
Analytics Engineer at Adyen. Chat with Guido Tournois (Tech Lead for Data Engineering at Adyen) and Caroline Imbert (EMEA Tech Recruitment Lead at Adyen)
People to follow: Andreas Kretz and Mehdi (mehdio) Ouazza
Andreas is an amazing content creator, and podcaster, very active on LinkedIn, Twitter, YouTube, and Medium; but more importantly, he has created an amazing resource called LearnDataEngineering.com where he teaches people the nuances of this interesting field.
This is one of the few courses I personally recommend for people new to the data engineering field.
Check out the course here.
Mehdi is another amazing content creator in the Data Engineering field, with a YouTube channel, and a Medium publication, but my favorite resource from him is the DataCreators Club, a directory of an incredible group of people who create content for this field.
Well done, Mehdi.
Interesting resources
[ARTICLE] Spark performance issues? Let’s optimize that code! by David Suarez (HEMA)
[ARTICLE] A Senior’s Guide to kickstart your BigQuery Journey, by Soliman ElSaber (AirAsia)
[VIDEO] How to find consecutive streaks in data using SQL window functions (and identify cheaters in Halo 5), by Zach Wilson (Airbnb)
[VIDEO] Quick to Production with Minimal MLOps with the Best of Spark and TensorFlow, with Ronny Mathew (Rue Gilt Groupe) and Denny Lee (Databricks)
[ARTICLE] Reduce read I/O cost of your Amazon Aurora PostgreSQL database with range partitioning, by Sami Imseih and Yahav Biran (AWS)
[ARTICLE] Tuning Spark Applications to Efficiently Utilize Dataproc Cluster, by Aride Chettalim (PayPal)
[ARTICLE] A Step-by-Step Guide to Training a Machine Learning Model using BigQueryML (BQML), Michal Brys (GetInData)
[ARTICLE] Amazon SageMaker Studio and SageMaker Notebook Instance now come with JupyterLab 3 notebooks to boost developer productivity, by Sean Morgan, Arkaprava De, and Kunal Jha (AWS)
[ARTICLE] Introducing granular instance sizing for Cloud Spanner, now run production workloads for as low as $40/month, by Vaibhav Govil (Google)Apply to join nowIf you’re finding this newsletter valuable, consider sharing it with friends, or subscribing if you haven’t already.
Reply