Guide To R For SEO

r for seo

If you’ve heard of the R language and think that SEOs who use it are aliens, you’re not totally incorrect.

Originally designed for data scientists and statisticians, the R programming language has gained popularity in recent years, and the rationale is simple:

Automate tasks, extract and aggregate data via APIs, scrape web pages, cross-reference numerous files (for example, keywords), or do text mining, machine learning, NLP, and semantic analysis with R and its many SEO tools.

But let’s be clear: R isn’t an SEO secret!

If you’ve ever wanted to build your own SEO tools or transition from classic empirical techniques to data-driven SEO, you’re on your way to becoming an extra-terrestrial as well.

What Exactly Is SEO?

Search Engine Optimization, or SEO, is the practice of increasing the quality and quantity of internet traffic from search engines to a website or web page.

What Exactly Is R?

R is a programming language and software environment that is free to use.

Why Would You Want To Use R For SEO?

R is a programming language that specialises in data mining, statistical and data analysis, and data visualisation. It also has a good crawling ability. Basically, everything that can help you with SEO.

R’ is likewise extremely simple to read and write after you’ve grasped the principles. Even if you don’t want to understand it thoroughly, you can crawl a website or extract Google Analytics data and export it to a CSV by copying and pasting three lines of code.

When Is It OK To Use R For SEO?

When dealing with large websites with thousands of pages, R comes in handy. I’m a great admirer of automation, but its use must be carefully assessed. There are a number of wonderful SEO tools available; R’ will never replace them, but it is a really welcome addition to your toolbox.

Where Should I Write The R?

Begin by downloading R and installing the open-source and free software R Studio.

After installing R Studio, you may test the following R scripts directly in the console (bottom left section) or copy and paste them into a new script: R Script is a file that can be created by going to File > New File.

Primary Functions

sessionInfo() #View environmental information

??read #Help

getwd()   #View the Working directory

setwd(“/Users/remi/dossier/”) #Set the working directory

list.files()    #See the contents of the directory

dir.create(“nomdudossier”) #Create a folder in the directory

 

R Packages

R comes with a plethora of packages (= “functionalities” to download). The list is available on the cran-r website.

install.packages(“nomdupackage”) #Install a package

install.packages(c(“packageA”, “packageB”, “packageC”))#Install several packages at a time

 

#Install a package list only if they are not already installed

list.of.packages <- c(“dplyr”, “ggplot2″,”tidyverse”, “tidytext”, “wordcloud”, “wordcloud2”, “gridExtra”, “grid”)

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,”Package”])]

if(length(new.packages)) install.packages(new.packages)

 

library(“nomdupackage”) #Load an installed package

?nomdupackage #View package documentation

packageVersion(“nomdupackage”) #Get the version of a package

 

detach(“package:dplyr”, unload=TRUE)#Forcing a package to close without closing R Studio

 

Here are some of the most important R programmes for SEO:

  •       dplyr: Working with data from a dataframe (filter, sort, select, summarize, etc)
  •       SEMrushR (French): Make use of the SEMrush API
  •       majesticR (French): Make use of the Majestic API
  •       kwClustersR: Group a collection of keywords.
  •       duplicateContentR (French): Determine a similarity score between two pages in order to detect duplicate content.
  •       text2vec: Extract n-grams.
  •       eVenn: Make Venn diagrams (useful for semantic audits)
  •       tm: Handle accents and stopwords
  •       ggplot: Create graphs
  •       Shiny: Build a real-world application based on your scripts.
  •       searchConsoleR (French): Use the Google Search Console API
  •       httr: Perform GET, POST, PUT, and DELETE operations

requests

  •       Rcurl: For making requests that are more comprehensive than httr.
  •       XML: For parsing web documents.
  •       jsonlite: Obtain json
  •       googleAuthR: To manage Google authentication.

APIs

  •       googleAnalyticsR: Using the Google Analytics API
  •       searchConsoleR: Downloading data from the Google Search Console into R
  •       urltools: Perform URL processing

Manage Large Volumes Of Data

In every SEO project, data is used methodically, whether it comes from Screaming Frog, SEMrush, Search Console, your Web Analysis tool, or another source. They can be obtained directly through APIs or by manual exports.

Tips on how to process these datasets can be found in the sections that follow.

Save And Open A Dataset

mondataframe <- data.frame() #Create a dataframe (allows you to mix digital and text data)

merge <- data.frame(df1, df2) #Merge 2 dataframes

 

#Open a TXT file

Fichier_txt <- read.table(“filename.txt”,header=FALSE, col.names = c(“nomcolonne1”, “nomcolonne2”, “nomcolonne3”))

#Open an XLS

library(readxl)

Fichier_xls <- read_excel(“cohorte.xls”, sheet = 1, col_names = FALSE, col_types = NULL, skip = 1)

#Open a CSV

Fichier_csv <- read.csv2(“df.csv”, header = TRUE, sep=”;”, stringsAsFactors = FALSE)

 

#Save your dataset

write.csv() #Create a csv

write.table() #Create a txt

 

#Change column names

cnames <- c(“keywords”, “searchvolume”, “competition”, “CPC”) #we define names for the 4 columns of the dataframe

> colnames(mydataset) <- cnames #the column names are assigned to the dataframe

 

Know The Dataset

object.size(dataset) #Get the object size in bytes

 

head(dataset) #See the first lines

tail(dataset) #See the last lines

 

colnames(dataset) #Know the names of the columns

apply(dataset, 2, function(x) length(unique(x))) #Know how many different values there are in each column of the dataset

 

summary(dataset) #Have a summary of each column of the dataset (minimum, median, average, maximum, etc.)

summary(dataset$colonne) #Same thing for a particular column

dim(dataset) #Dataset dimensions (number of columns and rows)

str(dataset) #More complete than dim() : Dataset dimensions + Data type in each column

 

which(dataset$Colonne == “seo”) #Look for the rows in the “Column” column that contain the value “seo”.

 

Prioritize The Dplyr Package

DPLYR is THE bundle you must be aware of. It will allow you to do a variety of operations on your datasets, including selection, filtering, sorting, and classification.

library(“dplyr”)

 

#Select columns and rows

select(df,colA,colB)  #Select colA and colB in the df dataset

select(df, colA:colG) #Select from colA to colG

select(df, -colD)  #Delete the column D

select(df, -(colC:colH))  #Delete a series of columns

slice(df, 18:23)  #Select lines 18 to 23

 

#Create a filter

filter(df, country==”FR” & page==”home”)  #Filter the rows that contain FR (country column) and home (page column)

filter(df, country==”US” | country==”IN”) #Filter lines whose country is US or IN

filter(df, size>100500, r_os==”linux-gnu”)

filter(cran, !is.na(r_version)) #Filter the rows of the r_version column that are not empty

 

#Sort your data

arrange(keywordDF, volume) #Sort the dataset according to the values in the volume column (ascending classification)

arrange(keywordDF, desc(volume)) #Sort the dataset in descending order

arrange(keywordDF, concurrece, volume) #Sort the data according to several variables (several columns)

arrange(keywordDF, concurrece, desc(volumes), prixAdwords)

 

Other Points To Consider

The following are commands that we frequently employ to execute operations on huge keyword datasets, such as SEMrush, Ranxplorer, or Screaming Frog outputs.

These procedures allow us to move more quickly in our quest for SEO chances.

For Screaming Frog exports, you’ll find several commands here to count items like the number of URLs crawled, the number of empty cells in a column, and the number of URLs for each status code.

#Convert a column to digital format

keywords$Volume <- as.numeric(as.character(keywords$Volume))

 

#Add a column with a default value

keywordDF$nouvellecolonne <- 1 #create a new column with the value 1

 

#Add a column whose value is based on an operation

mutate(keywordDF, TraficEstime = keywordDF$CTRranking * keywordDF$volume) #create a new column (TrafficEstime) based on 2 others (CTRranking and volume)

mutate(keywordDF, volumereel = volume / 2)

 

#Split a dataset into several datasets #Very useful to divide a list of keywords by theme split(keywords, keywords$Thematique)

 

Extraction Of Content From The Internet And Web Scrapping

Making a crawler is a great way to rapidly get certain components of a web page. This will be used to track the progress of a competitor’s website, such as its pricing strategy, content revisions, and so on.

XML SCRAPER

You may use the following script to download an XML file, parse it, and get variables of interest to you. You’ll also learn how to turn it into a dataframe.

#1. Load packages

library(RCurl)

library(XML)

 

#2. Get the source code

url <- “https://www.w3schools.com/xml/simple.xml”

xml <- getURL(url,followlocation = TRUE, ssl.verifypeer = FALSE)

 

#3. Format the code and retrieve the root XML node

doc <- xmlParse(xml)

rootNode <- xmlRoot(doc)

 

#3.1 Save the source code in an html file in order to see it in its entirety

capture.output(doc, file=”file.html”)

 

#4. Get web page contents

xmlName(rootNode) #The name of the XML (1st node)

rootNode[[1]] #All content of the first node

rootNode[[2]][[1]] #The 1st element of the 1st node

xmlSApply(rootNode, xmlValue) #Remove the tags

xpathSApply(rootNode,”//name”,xmlValue) #Some nodes with xPath

xpathSApply(rootNode,”/breakfast_menu//food[calories=900]”,xmlValue) #Filter XML nodes by value (here recipes with 900 calories)

 

#5. Create a data frame or list

menusample <- xmlToDataFrame(doc)

menusample <- xmlToList(doc)

 

DU HTML SCRAPER

Obtaining links from a website, or retrieving a list of articles, are just a few instances of what you may accomplish using the script below.

#1. Load packages

library(httr)

library(XML)

 

#2. Get the source code

url <- “https://remibacha.com”

request <- GET(url)

doc <- htmlParse(request, asText = TRUE)

 

#3. Get the title and count the number of characters

PageTitle <- xpathSApply(doc, “//title”, xmlValue)

nchar(PageTitle)

 

#4. Get posts names

PostTitles <- data.frame(xpathSApply(doc, “//h2[@class=’entry-title h1′]”, xmlValue))

PostTitles <- data.frame(xpathSApply(doc, “//h2”, xmlValue))

 

#5. Retrieve all the links on the page and make a list of them

hrefs <- xpathSApply(doc, “//div/a”, xmlGetAttr, ‘href’)

hrefs <- data.frame(matrix(unlist(hrefs), byrow=T))

 

#6. Retrieve links from the menu

liensmenu <- xpathSApply(doc, “//ul[@id=’menu-menu’]//a”, xmlGetAttr, ‘href’)

liensmenu <- data.frame(matrix(unlist(liensmenu), byrow=T))

 

#7. Retrieve the status code and header

status_code(request)

header <- headers(request)

header <- data.frame(matrix(unlist(header), byrow=T))

 

JSON Scraper

#1. Load the package

library(jsonlite)

 

#2. Get the JSON

jsonData <- fromJSON(“https://api.github.com/users/jtleek/repos”)

 

#3. Retrieve the names of all nodes

names(jsonData)

 

#4. Retrieve the names of all nodes in the “owner” node

names(jsonData$owner)

 

#5. Retrieve values from the login node

jsonData$owner$login

 

Quiz

/5
0 votes, 0 avg
6

R for SEO

 Increase your knowledge

1 / 5

Which of the following statement about FFA web pages hold true?

2 / 5

What will take place if you type the word 'Certification-Python' in the Google search box?

3 / 5

Google looks down upon paid links for enhancing page ranking. If a website sells web links, what action/s does Google suggest to avoid being penalized?

4 / 5

Guide To R For SEO question and answer

5 / 5

What elements of a hyperlink are not important for SEO?

Please fill in a valid email address for receiving your Certificate
Thanks for attending the quiz

Your score is

0%