A MEDIA ARCHAEOLOGY OF ONLINE COMMUNICATION PRACTICES
THROUGH SEARCH ENGINE AND SOCIAL MEDIA OPTIMIZATION
by
KAREN M. ESTLUND
A DISSERTATION
Presented to the School of Journalism and Communication
and the Division of Graduate Studies
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
June 2021
DISSERTATION APPROVAL PAGE
Student: Karen M. Estlund
Title: A Media Archaeology of Online Communication Practices through Search
Engine and Social Media Optimization.
This dissertation has been accepted and approved in partial fulfillment of the
requirements for the Doctor of Philosophy degree in the School of Journalism and
Communication by:
Dr. Kim Sheehan Chairperson
Dr. Biswarup Sen Core Member
Dr. Seth Lewis Core Member
Dr. Colin Koopman Institutional Representative
and
Andy Karduna Interim Vice Provost for Graduate Studies
Original approval signatures are on file with the University of Oregon Division of
Graduate Studies.
Degree awarded June 2021
ii
© 2021 Karen M. Estlund
This work is licensed under a Creative Commons
Attribution-NonCommercial 3.0 (United States) License
iii
DISSERTATION ABSTRACT
Karen M. Estlund
Doctor of Philosophy
School of Journalism and Communication
June 2021
Title: A Media Archaeology of Online Communication Practices through Search
Engine Social Media Optimization
The control of information is embedded in the cultural politics and institutions
that regulate access to information. In its most basic form, communication is a
practice of enabling the exchange of information. Websites have become one of the
primary ways that people access information; however, most of the access is
mediated through search engines and social media platforms. Communication
research has explored the role of these platforms as gatekeeper and critical studies
have attended to the ideologies of search algorithms. From the advertising and public
relations industries, advice has emerged to communicators on how to make their
content accessible through these gatekeepers using optimization strategies. Critical
communication studies have not examined the relationship between these
optimization strategies that are used on actual webpages and access to information.
This dissertation seeks to fill that gap by asking how optimization techniques are
structured in online communications to increase access to information. How do the
techno-infrastructure of HTML and embedded assumptions shape communication
online? Where are points of resistance and opportunities for influence? How does this
iv
differ from historic methods of preparing communications to be discovered and
retrieved? This dissertation explores the history of search engine and social media
optimization through a media archaeological approach to uncover the invisible
infrastructures, habits, and assumptions that surround and shape communication
online. By utilizing a media archaeological analysis, I will be able to situate the
multi-layered practices in the form of optimization strategies. Critical histories are
meant to be emancipatory. This dissertation is important for communication studies to
develop an understanding of how we enable and influence discussions in our current
digital cultural moment and to provide strategies for how communications are
accessed.
v
CURRICULUM VITAE
NAME OF AUTHOR: Karen M. Estlund
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED:
University of Oregon, Eugene
University of Washington, Seattle
Reed College, Portland, Oregon
DEGREES AWARDED:
Doctor of Philosophy, Communication and Society, 2021, University of Oregon
Master of Library and Information Science, 2005, University of Washington
Bachelor of Arts, Classics, 2001, Reed College
AREAS OF SPECIAL INTEREST:
Communication and Information Technologies
Copyright
Information Access
Media Studies
PROFESSIONAL EXPERIENCE:
Dean of Libraries, Colorado State University, 2019-
Associate Dean for Technology and Digital Strategies, Penn State Libraries,
The Pennsylvania State University, 2015-2019
Digital Scholarship Center, Head, University of Oregon Libraries, 2012-2015
Digital Library Services, Head, University of Oregon Libraries, 2011-2012
Digital Collections Coordinator, University of Oregon Libraries, 2007-2011
Digital Technology, Interim Head, J. Willard Marriott Library, University of
Utah, 2006-2007
Adjunct Professor, Department of Communication, University of Utah, 2006-
2007
vi
Technology Instruction Librarian, J. Willard Marriott Library, University of
Utah, 2005-2006
Graduate Staff Assistant, The Information School, University of Washington,
2003-2005
vii
ACKNOWLEDGEMENTS
Thank you, to my husband, Eric, who in the ten years working on this degree,
did not ask how it was going for the final three years of writing. Thanks for your love
and support, listening when I needed it, and for holding back your own curiosity and
anxiety so as not to spike my own anxiety.
Thank you to my committee. I especially thank my advisor, Kim Sheehan,
who never faltered in the belief that I could finish and provided encouragement along
the way, as well as helping me articulate the “why.” Thanks for supporting me to
pursue using a media archaeology analysis after that that philosophy course that blew
my mind, and thanks to Colin Koopman for teaching that course on politics of
information and introducing me to media archaeology. Thank you to Bish Sen for
helping me see research as a way to bring about positive change and reminding me
that there is more work to do. Thank you to Seth Lewis for taking on student that you
had never met.
Thanks to Radhika Gajjala and Carol Stabile for recognizing my potential and
reminding me that as much as is in my head, I haven’t done the work until it leaves
my head.
Thank you to Evviva Weinraub Lajoie, for friendship, helping me keep the
librarianship (day job) field contributions in motion, and for creating space for me
whether to write at your house or space to just be me. Thank you to Brandy Karl who
kept me on task with daily reminders of writing encouragement. This dissertation
would not have been completed without Brandy. Thank you for Carolyn and Scott
viii
Cole for your friendship and Scott’s eagle eye and advice in editing; however, I still
like semicolons.
I thank my parents, John and Peggy Mahon, who instilled a curiosity in me
and appetite to never stop reading and questioning. And to the memory of a family
car ride in 1996 when we discussed the pluses and minuses of pursuing a PhD, as
teenager me contemplated life goals. Took a while, but I did it!
ix
TABLE OF CONTENTS
Chapter Page
I. INTRODUCTION ............................................................................................. 1
Optimization Overview ............................................................................. 5
Search Engine Optimization (SEO) .......................................................... 6
Brief SEO History ............................................................................... 7
SEO Basics.......................................................................................... 8
SEO – The Dark Side ......................................................................... 10
Social Media Optimization (SMO) .......................................................... 12
Brief SMO History ............................................................................. 13
SMO Basics ....................................................................................... 15
Signficance of the Study .......................................................................... 16
Dissertation Overview ............................................................................. 19
II. THEORETICAL FOUNDATIONS & LITERATURE REVIEW ................... 21
Communication and Information Theory Models ................................... 22
A Mathematical Model of Communication ....................................... 22
Cybernetics ........................................................................................ 25
Digital Communication Models and the Internet’s Foundations ....... 27
Critical Approaches to Understanding Communication and
Information Models ........................................................................... 30
Digital New Media Studies ...................................................................... 33
Hidden Mechanisms ........................................................................... 35
Search Strategies and the Networked Document ............................... 37
Remix, Variability, and Mutability .................................................... 38
Politics of Information ............................................................................. 40
x
Chapter Page
What is “Politics?” and a Politics of Information .............................. 40
Politics of Information Organization ................................................. 41
Politics of ICTs (Information and Communications Technologies) .. 43
Gatekeeping ............................................................................................. 45
Gatekeeping and Mass Media ............................................................ 46
Gatekeeping Online ........................................................................... 48
III. METHODOLOGY ........................................................................................... 53
Research Questions .................................................................................. 53
Methodological Approach: Applying a Media Archaeological
Method ..................................................................................................... 57
Historical Documents ......................................................................... 61
Data Collection and Analysis ................................................................... 66
Instruction Manuals and How-to Guides ........................................... 67
Archived Webpages ........................................................................... 70
Summary .................................................................................................. 81
IV. COMMUNICATION SYSTEMS FOR INFORMATION RETRIEVAL ........ 83
Information Retrieval in Print Mediums .................................................. 83
The “Memex” for Information Retrieval ................................................. 85
Information Retrieval in Databases ......................................................... 86
Information Retrieval on the World Wide Web ...................................... 87
Information Retrieval in Social Media Platforms .................................... 91
Summary .................................................................................................. 93
xi
Chapter Page
V. HOW-TO GUIDES AND INSTRUCTION MANUALS FOR SEO
AND SMO ................................................................................................................. 95
Goals of SEO and SMO Manuals ............................................................ 96
Authors ............................................................................................... 96
Audiences ........................................................................................... 97
Approaches ........................................................................................ 98
SEO and SMO On-Page Strategies ......................................................... 100
URL Optimization ............................................................................ 101
Strategies within the HTML Page’s Header ..................................... 105
Strategies within the HTML Page’s Body ........................................ 113
Linked Data and Semantic Markup .................................................. 121
Summary ................................................................................................. 125
VI. NEWS STORIES USE OF SEO AND SMO STRATEGIES IN THE
LA TIMES.................................................................................................................. 126
Page Structure ......................................................................................... 127
Basic Metadata and Keywords ................................................................ 132
Relationships with Other Web Content and Social Media ..................... 142
Summary ................................................................................................. 144
VII. U.S. SENATE ELECTION POLITICAL CANDIDATE WEB PAGES
USE OF SEO AND SMO STRATEGIES ................................................................ 146
Page Structure & Content ....................................................................... 148
Basic Metadata & Keywords .................................................................. 157
Relationships with Other Web Content and Social Media ..................... 170
Summary ................................................................................................. 172
xii
Chapter Page
VIII. CONCLUSION ................................................................................................ 174
Summary of Findings .............................................................................. 175
Research Question One ..................................................................... 175
Research Question Two .................................................................... 180
Research Question Three .................................................................. 186
Contributions of the Study ...................................................................... 190
Limitations of the Study .......................................................................... 191
Future Directions .................................................................................... 192
APPENDIX A: DATA COLLECTION .................................................................... 194
Example Data Collection Sheet for Manuals and How-To Guides ........ 194
Example Webpages Data Collection Sheet ............................................. 195
APPENDIX B: DATA SELECTION OF POLITICAL CANDIDATE
WEBSITES ............................................................................................................... 198
Data Harvest Condition Collected .......................................................... 198
Data Collected for Political Candidate Condition .................................. 199
REFERENCES CITED ............................................................................................. 201
xiii
LIST OF FIGURES
Figure Page
1.1. Snippet from original PageRank algorithm ................................................... 7
2.1. Shannon’s Mathematical Model of Communication (Shannon, 1948) ........ 23
2.2. Layered Network Architecture from (Fall & Stevens, 2011, p. 14) ............. 28
3.1. Example “Save Page As…Webpage, Complete” artifacts. .......................... 63
3.2. Example comments surrounding HTML code inserted in Wayback
applications. .................................................................................................. 64
3.3. Example directional code inserted by Wayback application to direct to
archived versions of referenced files ............................................................ 65
3.4. Calendar browse interface of Open Wayback application displaying
number of snapshots of the webpage created by harvests. ........................... 72
3.5. Chronological graph of latimes.com website harvests on the Internet
Archive’s Wayback Machine, which spans 2000 to 2018 of publicly
available content. .......................................................................................... 74
4.1. Diagram of search in a print catalog or filing system ................................... 84
4.2. Annotated diagram of the Memex conceptual communication and
storage and retrieval machine from “As we may think” (Bush, 1945). ....... 85
4.3. Generalized diagram of text information retrieval systems and search
queries .......................................................................................................... 87
4.4. Internet search and retrieval using a search engine ...................................... 90
4.5. HTML content found through social media platforms ................................. 93
5.1. Basic anatomy of a URL (Technology for Librarians 101: Anatomy of a
Web Address, 2014). ................................................................................... 102
5.2. Domain registry process (Domain Name Registration Process | ICANN
WHOIS, n.d.). .............................................................................................. 104
5.3. Basic HTML structure. ................................................................................ 105
5.4. Schema.org example for a webpage with product information encoded in
schema.org highlighted in purple text adapted from (Shreves &
Krasniak, 2015, p. 122). .............................................................................. 122
xiv
Figure Page
5.5. Minimum recommended Twitter card tags (Shreves & Krasniak, 2015,
p. 127). ......................................................................................................... 123
5.6. A layering of different structured and coded title tags in HTML for a
supposed “My Awesome Headline.” ........................................................... 124
6.1. Screenshot of archived webpage published in 2006 with a “Go” search
button. .......................................................................................................... 130
6.2. Screenshot of 2001 webpage with “Tommy” appearing in first sentence
of article and photo caption. ........................................................................ 133
6.3. Screenshot of 2011 article where URL duplicates wording in article
title (
), “Obama 2012 campaign heads to Tumblr.” ............................ 134
6.4. Suffixes applied in tag for the Los Angeles Times .......................... 138
6.5 First paragraph of from 2013 article, “White House OKd spying on
allies, U.S. intelligence officials say” with example keywords
highlighted in bold. ...................................................................................... 141
7.1. Page title only visible through image for “Agriculture” (pc06e). ................ 149
7.2. Application of structured tags in the page . N=50; ten of the
webpages employed two techniques for hierarchical structure within
the page . ......................................................................................... 151
7.3. Screenshot of Jon Tester’s 2012 campaign website with antiqued
textured image background (pc12d). ........................................................... 153
7.4. Screenshot of Jon Tester’s 2012 campaign website before background
images load, resulting in some text, logos, and menu options rendering
faint and/or invisible (pc12d). ..................................................................... 153
7.5. Screenshot of Katie McGinty’s 2016 campaign website with background
image of McGinty in a café talking with assumed proprietor or staff
(pc16d). ........................................................................................................ 154
7.6. Title components and order in political campaign issue webpages. ............ 160
7.7. Keywords in for pc04c with keywords identified in analysis
highlighted in bold. ...................................................................................... 169
7.8. Keywords in for pc06e with keywords identified in analysis
highlighted in bold. ...................................................................................... 170
xv
LIST OF TABLES
Table Page
1.1. Adapted from “Snapshot of Major Changes in Google Algorithm
History” ......................................................................................................... 9
1.2. SEO Techniques ............................................................................................ 11
1.3. Basic SMO Strategies ................................................................................... 15
3.1. Chronological Listing of How-to Guides and Instruction Manuals .............. 69
3.2. Newspaper articles selected from Los Angeles Times on the Internet
Archive ......................................................................................................... 75
3.3. U.S. Selection of political candidate archived webpages on issues. ............. 79
6.1. tag in the news articles examined from the
Los Angeles Times ....................................................................................... 140
6.2. Prescence of links from news article webpages by category off of the
webpage; * outbound links to an external website. ..................................... 143
7.1. Content of description metadata tags for campaign issue pages with
descriptions. ................................................................................................. 166
7.2. data used in the campaign issue pages. ............ 168
B.1. U.S. Senate closest races and available condition of harvested webpages
with issue content at the Library of Congress. ............................................ 199
xvi
CHAPTER I
INTRODUCTION
“[T]he overwhelming propensity of most people is to invest in as absolutely little
effort into information seeking as they possibly can (Bates, 2002).”
As Americans increasingly cite the Internet as their primary way to keep informed
and share ideas, 1 studies of new media communication in the online environment are
increasingly important to understand the structures that govern how communication
occurs online. The way to access much online content relies on large online ICT
(information and computer technologies) commercial giants such as Google, Microsoft
Bing, Facebook, and Twitter. The role of “gatekeeping” has transitioned from a print
environment where news and publishers determined content and libraries and archives
selected content to one where online services of search engines and social media sites
become the “gatekeeper” for accessing information. The ease, speed, and amount of
information available creates an illusion of direct access to information and lack of a
gatekeeper. In attempts to unveil this illusion in the online environment, two primary
socio-technical communication investigations have emerged: 1) critical analyses of the
algorithms that the search and social media companies use to promote content, and 2)
analyses of the strategies that users employ to work with the gatekeeping systems of the
day to make their content more visible in these venues, search engine and social media
optimization.
1 A 2014 Pew Internet Center study found that 87% of Americans felt the Internet made them better
informed. 75% felt better informed about national news. Only 49% felt it made them better informed about
civic and local government activities. (American Feel Better Informed Thanks to the Internet, 2014).
In this second area of investigation, studies have focused on the active processes
used for employing search engine and social media optimization strategies and the
effectiveness of such strategies. SEO and SMO strategies are ways that content creators
attempt to influence how and where their content appears in these gatekeeping tools.
These techniques are often categorized under Search Engine Marketing (SEM). 2
Through these processes, various strategies and tools are employed within HTML pages
with the goal that users will click on the search engine result or social media posting and
go back to the organization’s HTML site. Little attention in the communications field has
been given to the institutions and technical constructs for content creation process and
methods and tools used to either speak to or game the algorithms to further a message. A
critical examination is needed to place the historical context of SEO and SMO strategies
within the larger communication and online environment.
This dissertation seeks to identify the political and practical infrastructure
surrounding access to communication of information online. By concentrating on the
content creation and the optimization strategies used to make information available
through online platform gatekeepers, this project hopes to identify opportunities for
counter voices. This dissertation focuses on the optimization strategies of SEO and SMO
2 This project does not largely include Search Engine Marketing (SEM) paid-for services, such as Google
AdWords or Bing AdInsights, because they are governed by different technical structures than organic or
native search engine results. With the paid services, different algorithms used to return results and
additional mechanisms including payments and bidding processes are used to determine how and what
content for the ads is displayed. Overlapping techniques or tools that are used for both paid-for and organic
SEO and SMO techniques, such as with keyword placement or Google defined quality measures for
prioritizing content, will be addressed.
2
as the primary tools for exposing content in these environments.3 This history of
optimization strategies will explore the structures that enable optimization and that
control access to content. The following research questions guide this inquiry:
RQ1: What is the historical development of SEO and SMO strategies?
RQ1a: What are the topoi in these practices?
RQ1b: What is the interplay with changes in proprietary algorithms over
time?
RQ2: How has the development of SEO and SMO strategies been actualized in
HTML practices for major persuasive information industries?
RQ2a: How have the strategies been implemented in newspapers’ online
presence?
RQ2b: How have the strategies been implemented in political candidate
websites?
RQ3: How have SEO and SMO strategies shaped communication online?
To address these questions, this dissertation employs a historical media
archaeology approach. Media archaeological approaches are historical analyses that may
use quantitative or qualitative methods. A media archaeological approach is especially
useful in examining current phenomena and placing them within a larger historical
context to aid in understanding this current environment. Following from Foucault’s
framework for an archaeological analysis, the structure and rules are emphasized over
content, intent, and the “creative subject” (Foucault, 1972). By using a historical media
3 This project is focusing on content that creators want to openly expose. There is a recognition that not all
content on the Internet is available through search engines and social media platforms and also that not all
content wants to be found and communicated through these intermediaries. Technological issues of
actionable code/scripts and databases may also prevent content from being discovered through these
services.
3
archaeology approach, I will investigate both the conceptual ideas and technologies
surrounding search engine and social media optimization, examining approaches and
strategies within the structures of the HTML webpage and the role of search engine and
social media platforms to influence practices.
As an object of study, I will examine the SEO and SMO practices in HTML
webpages. The scope of study is limited to on-page SEO and SMO techniques and for
the article or issue page. For example, the Los Angeles Times will be examined for its
practices to increase exposure within search engine results and social media platforms
looking at specific HTML renderings of a newspaper article. Examples of off-page
techniques that will not be examined include the number of external websites linking into
pages and content created natively within social media platforms. Practices examined
include application of tags and semantic web, hyperlink analysis, and structured
content. The media archeological examination will include three sources of data: 1)
instruction manuals and guidebooks on SEO and SEO strategies; 2) select Los Angeles
Times article webpages harvested and available from the Internet Archive’s Wayback
Machine;4 and 3) U.S. Senate political candidate webpages harvested and available from
the Library of Congress United States Elections Web Archive.5
4 https://archive.org/web/
5 https://www.loc.gov/collections/united-states-elections-web-archive/
4
Optimization Overview
“Optimization” is defined as finding the best and most efficient process as close
to “fully perfect” as possible.6 Through techniques of SEO and SMO, the content creator
employs strategies to promote access to their content. Although optimization techniques
are intended to identify the most effective way of making content accessible, the
techniques are heavily regulated by the search engine and social media corporations.
Because it is in the interest of these corporations to have content structured for their
services, they usually provide helpful and detailed guidelines on techniques and standards
for their systems.7
There is also the practice of extreme optimization, called Black Hat, which occurs
when web content creators game the systems developed by the gatekeepers in order to
promote webpages using the rules of the structure that may have little to do with any
actual content. These tactics are touted by the web and news industries as illegal and/or
ineffective (Boutet & Quoniam, 2012). There is debate in the web community about the
appropriate levels of optimization to employ; however, the side of the “right” is typically
associated with the search engine and social media corporations. The context for how the
coding structures, tools, guidelines, and institutions have historically enacted these
regulations of information is important to understanding how information can be
accessed through these modern gatekeepers.
6 http://www.merriam-webster.com/dictionary/optimization
7 Google, Search Engine Optimization Starter Guide: http://www.google.com/webmasters/docs/search-
engine-optimization-starter-guide.pdf; Bing, SEO Analyzer: http://www.bing.com/toolbox/seo-analyzer;
Facebook, Content Sharing Best Practices: https://developers.facebook.com/docs/sharing/best-practices;
Twitter, Twitter Cards: https://dev.twitter.com/cards/overview
5
Optimization is different in search engine and social media platforms. Search
engines attempt to provide access to the “right information” that matches a user’s query;
whereas social media platforms are less concerned about “right information” and seek to
provide a good user experience. Despite these separate goals, because of the structure of
the web and HTML documents and similar work involved on the content creator’s end,
which often overlaps, it is useful to examine the practices as a holistic set of techniques
for content creators to make their content available through the primary gateways to
information on the web.
Search Engine Optimization (SEO)
Search Engine Optimization is the set of strategies and practices used to influence
placement and ranking on search engine results pages (SERPs) for indexed web content.
Indexing content to make it readily accessible has been a central feature of the Internet.
The first search engine, Archie, was designed in the late 1980s to search content in
ARPANET (Savetz, 1993). Since then, the web search engines have continued to evolve
and act as gatekeepers to the content on the open World Wide Web (Introna &
Nissenbaum, 2007). The indexed web, including Google, Yahoo, and Bing, comprised
around 50 billion pages of web content in 2014, and Goolge has 92% of the market
share(The Size of the World Wide Web (The Internet), n.d.). Although not all web content
is available via the indexed web and listed on SERPs, a high ranking result on a SERP is
an essential part of making web content accessible with many websites finding up to 64%
of traffic coming from organic search results (Zeckman, 2014). Strategies to increase the
likelihood of a high-ranking result for web content have changed in conjunction with
6
changes to search engine algorithms and media formats. Search engines have also
responded to SEO strategies that they consider harmful, such as Black Hat strategies, and
modified their algorithms to retain control of what appears in search results (Malaga,
2008a).
Brief SEO History
In order for search engines to take advantage of the information provided on the
web, they selected elements and practices in HTML to query and return. Each search
engine needs an algorithm to function. The most famous of these is the Google
PageRank® algorithm:
Figure 1.1. Snippet from original PageRank algorithm.8
In the late 1990s search engines were still trying to categorize the web as well as
provide search functionality. Google’s launch in 1998 changed this behavior, and the
search box that is now ubiquitous became the primary tool of search engines. A new
industry arose around helping web content creators receive a better ranking on search
engines around 1996, and the term “Search Engine Optimization (SEO)” was coined to
describe these strategies (Sullivan, 2004).
Many of the changes in SEO strategies have been subtle but important and are in
direct response to rules set by search engines. Each search engine has different algorithms
8 (Brin & Page, 1998)
7
and may promote slightly different HTML elements and practices, yet function out of
similar principles. The most significant public changes to search engine algorithms
regarding SEO all involve Google because Google is the only engine that makes major
changes public. Most Google changes are primarily in response to what it considers
illegal or unethical behavior. For example, in 2009, Google discontinued emphasis on
keyword tags because they deemed that too many content creators were using
them to mislead the search engine and plagiarize or subvert the work of competitor
information or commercial sources (“Google Does Not Use the Keywords Meta Tag in
Web Ranking,” 2009). Some changes are also in response to technology and device
developments.
In 2012, an editorial on Forbes.com entitled, “The Death of SEO: The Rise of
Social, PR, and Real Content,” created a storm of comments, speculation, denial, and
support. The article received over 525 comments and was the #1 trending article for 2012
before Forbes decided to lock the comments (Krogue, 2012). This editorial and many of
the changes that followed for SEO was not a “death of SEO” but rather an integration in
SMO and a realization that SERPs started to give preference to social media content in
results, as well as the drift of users accessing content directly through social media
platforms. SEO practices and the SEO industry continue to thrive.
SEO Basics
Although not all SEO will affect each search engine in the same way, general
SEO expert advice recommends following Google SEO strategies, which will thereby
affect rankings with other search engines, but paying attention to the subtle differences
8
Table 1.1. Adapted from “Snapshot of Major Changes in Google Algorithm History”.9
Date Updates Purpose
2015- Unnamed; referred Rank by the quality and “truthfulness”
to by the SEO of a webpage (Dong et al., 2015)
industry as
Phantom 2
2015 Mobile Increase rank on main Google SERP if
Friendliness webpage is mobile-friendly
2014 “In the News” Box Blogs and non-traditional news media
included in News Search Results and
“In the News” box on main Google
SERPs
2014 Pigeon Focuses results on a local geographic
level to provide more relevant results
to users
2013 Hummingbird Builds on earlier knowledge graph
integration and allows for semantic
web and knowledge graph search
2012 Penguin Address web spam and sites not
following Google’s Webmaster quality
guidelines
2011 Panda Address content and link farming and
high-ad sites. In direct response to
actions from Overstock.com and JC
Penney’s, which took over Google
Search results for many consumer
goods.
2010 Social Signals Customize search results based on
social media and network of user
2010 Caffeine Google search infrastructure
redesigned for fresher content; no
effect on SEO.
2009 Real Time Search Emphasize news and social media
2009 Keyword Trust tags for keywords no longer
factored in results
2008 Google Suggest Shows user popular search string
options as they type in the search box
2005 Jagger Targeted at poor quality links and link
farms
2003 Boston – Fritz Changes to index, supplemental index,
(monthly updates) treatment of links and hidden links and
text
9 See (“Google Algorithm Change History,” 2015, “Timeline of Google Search,” n.d.)
9
that Bing and Yahoo may utilize increased access to content (Sherrod, 2010; Smarty,
2009; The Differences Between Google & Bing SEO Algorithms, 2014). The historic and
current framework necessitates that SEO activities take place with the HTML framework,
which means that much multi-media content such as images and videos are not the focus
of SEO activities beyond the tags available within HTML code.10 Most SEO strategies
can be either manually applied or automated / scripted.
In addition to the techniques outlined in Table 1.2, integration on social media
sites and structural considerations, such as Google’s and Bing’s design and mobile-
friendly preference rankings are also used for SEO. It is also important to notify search
engines to crawl your site. Many SEO experts recommend creating a sitemap that lists all
the pages and links in your website and submitting that to each search engine (West,
2012).
SEO – The Dark Side
In the web industry, these strategies and techniques also have variant practices that are
labelled as “white hat” – correct, good, honest, and proper methods – and “black hat” –
malicious, sneaky, and false methods. The notion of “fully perfect,” “good,” and “right”
permeates how web content creators are supposed to act and follow the rules set by the
10 With increasing search engine and social media queries for things like color search and facial
recognition, it will be interesting to see if these techniques affect content creators in their quest to have
information be found. The same use-case for enhancing content to be found does not appear to have
modified current strategies. See Google color image search (Tanguay, 2009); Facebook Facial
Recognition, (Taigman et al., 2014).
10
Table 1.2. SEO Techniques.11
SEO Technique Description
Link Building The process of encouraging relationships for others to link
back to your site (i.e., inbound links). – Off-page technique
Link Farming Linking to other sites (i.e., outbound links).
tag Craft a succinct title with keywords toward the front that is
less than 54-75 characters. Note: this title is not the same
as a title of the content.
description The description is less used in ranking and more often a
tag12 tool displayed as part of the SERP. The description tag
should include keywords in the title and be less than 160
characters.
tags13 keyword tags were once highly utilized but are
less used. Additional metadata elements may be put into
customized tags for semantic web or specific applications
such as Google Scholar, with information such as the
academic journal from an article’s citation.14
Keywords Ensure good keywords are in all the headings ,
, , etc. tags on a page and through
text.
Rich Snippets Utilize schema.org and semantic web markup embedded
in content on the page, such as news, events, and media.
Cloaking and Doorway Create pages for indexing that redirect to a different page
Pages of content. Black Hat Technique
Designed URLs Use URLs that are expressive and descriptive of the page
content, are short, and use hyphens between words.
Assign “canonical” URLs when duplicate page content
exists, such as a print version. Use full URL addresses for
internal site links.
search engines and social media corporations. Activities that do not conform to the rules
are labelled as “black hat.” These activities include cloaking, link farms, hidden code,
11 Adapted from: (Fishkin, 2015b; Killoran, 2013; Malaga, 2008; West, 2012; Yalçın & Köse, 2010)
12 E.g.,
13 Basic HTML metatags in addition to description are “author” and “keywords”
http://www.w3schools.com/tags/tag_meta.asp; e.g., ;
14 https://scholar.google.com/intl/us/scholar/inclusion.html#indexing; e.g., ;
11
and door pages among others (Killoran, 2013; Malaga, 2008). “Black hat” activities are
also often associated with SEO strategies that utilize automation or scripting.
There is little interrogation of what actually constitutes “black hat” methods and
what are simply ways of making content more accessible that wouldn’t be otherwise
(such as creating doorway pages for heavy multimedia content which cannot be as easily
queried as text).15 Black hat techniques are often conducted by spammers and other
counterfeit companies (Israel et al., 2013; Lu & Lee, 2011; Wang et al., 2011, 2014);
however, going against the algorithms isn’t always malicious. A well-researched content
on a webpage, with many citations and links to those sources, but those sources having
pre-existed don’t link to the item, would be considered link farming, and the page could
be banned from search engine results. Advocates of these types of activities have tried to
encourage active SEO that does not necessarily adhere to all of the rules defined by
search engines and challenges the assumptions that not following the rules is an unethical
activity (Boutet & Quoniam, 2012; Fishkin, 2008).
Social Media Optimization (SMO)
Social Media Optimization (SMO) for web HTML pages is the process of making
HTML content social-media ready so that it can be integrated into social media feeds.
The integration is typically started by either a representative for a content creator or a
15 One of the landmark changes in Google search algorithms occurred in 2006 after BMW’s German
website created doorway pages which had the search terms and content that a search engine would retrieve
but then redirected to a multimedia site. Google banned BMW from its search results for a time due to the
action, as well (Malaga, 2008).
12
content recipient linking to the HTML information. The relationship between what is web
content and what is social media content is also blurred through this integration as
content is formulated for both traditional websites and social media platforms and the
user’s role in selecting and contributing the content plays an important role in the
distribution of information (Gerlitz & Helmond, 2013). Like search engines, social media
platforms can act as gatekeepers to providing access to information; however, unlike
search engines the information is provided by a combination of social relationship
recommendations from one own’s circles and groups and the algorithms from the social
media platform feeds. The highly localized and personalized information in these feeds
has been studied as noteworthy of the ever-narrowing exposure to diverse content for
people using these platforms (Hermida et al., 2012; Messing & Westwood, 2012).
Unlike SEO, SMO strategies are less about getting the algorithms to process and
rank the content highly and more about making the content the correct format and, most
importantly, creating content that appeals to a person so that they are likely to link to it in
their social media feeds (Foster, 2015; Rayson, 2013). The emphasis on providing
metadata, rich-formatted content, efficient sharing methods, and viral content are the
central precepts of social media optimization.
Brief SMO History
SMO strategies came to the forefront of marketing and communication efforts
around 2006. Several marketing websites point to a blog post by Rohit Bhargava from
the Influential Marketing Group as the start of the SMO usage. In this post, Bhargava
outlines five tenets that should be considered for SMO: 1) increase your linkability, 2)
13
make tagging and bookmarking easy, 3) reward inbound links, 4) help your content
travel, and 5) encourage mash-up (Bhargava, 2006). SMO strategies were quickly taken
up by the SEO community and added to many SEO guides (Fishkin, 2015a). SMO
strategies have not changed dramatically over time but have aggressively continued to
focus on basic structural information to make linking and mash-up easy and content that
tries to appeal to users to get them to forward the HTML content to social media. These
strategies have shifted toward automated efforts and the work of bots to increase
prevalence in social media platforms (Allen, 2016).
The changes that have precipitated most SMO modifications are a result of social
media platforms prioritizing the type of content they choose to highlight and hiding
content that they determine is too aggressively marketed. In a different manner than
search engines, the social media platform is more concerned with advertising interfering
with the social experience and not undercutting their advertising revenue rather than a
focus on providing relevant or accurate results in a feed. Social media platforms hide
content from feeds that they consider too promotional or spam. Social media platforms
have also attempted to reduce “fake news,” but efforts to eliminate the biases have
largely been ineffective (Levin, 2017). Because social media optimization strategies are
about structure and helping the social media platforms consume and display the HTML
content, they can be used in any type of page or content. The existence of such strategies
also does not have a bearing on whether the content is accurate, relevant, or fake news.
The social media platforms rely heavily on user interactions and judgement on content
formatted to feed well into their systems.
14
Table 1.3. Basic SMO Strategies.
SMO Technique Description
Open Graph tags16 Use the semantic web Open Graph protocol to catalog
information on page. Use primarily for Facebook.
Twitter Card tags17 Use twitter specific metadata schema to format elements
and provide metadata for twitter formatted “card” to
include media and rich text elements with shared content
from HTML pages. Use for Twitter.
Social buttons (like and share) Use HTML buttons on site that forward user to social
media platform to like and or share content. Use for
multiple social media sites.
Headline / Title Optimization Test multiple titles using automated tools or A/B tests with
for Click Through Rate (CTR) users to identify titles that will have high CTRs with social
media users to return users from social media to home
HTML pages.
Share-able Image Create an image and share on HTML page formatted
specifically for social media platforms. Size the image(s)
depending on the platform and reference in the Open
Graph and/or Twitter card tags.
Bots Bots are automated methods of imitating user behavior to
post or communicate with users.
SMO Basics
The two primary social media platforms that provide guidance and are generally
thought of as best practices to follow are for Facebook and Twitter, who have the largest
current social media shares. SMO on-page techniques for HTML webpage content focus
in a few main areas. Tools and content management and publishing platforms often assist
in the creation of the structured metadata that may be needed for easy integration into
social media platforms. However, individualized author strategies may also be required.
The editor-at-large of Upworthy notes making writers come up with 25 titles for each
16 See http://ogp.me/; e.g.
17 See https://dev.twitter.com/cards/markup; e.g.,
15
post and then running them through CTR (click through rates) tools to test effectiveness
(Mordecai, 2014).
Signficance of the Study
Communication scholars have studied the gatekeeping function of search engines
(Granka, 2010; Introna & Nissenbaum, 2007; Mager, 2012) and the targeted and selected
content choices available in social media feeds and reception (Hermida et al., 2012;
Khang, Ki, & Ye, 2012; Lovejoy & Saxton, 2012).18 These studies primarily focus on
the receiver of information or the actor as the gatekeeper. They have been significant in
exposing the bias in content selection and autonomy of the receiver; however, they have
not explored the everyday strategies to challenge these barriers.
Critical studies of search engines and social media have focused on the ideology
of search engines (Fuchs, 2012b; Mager, 2013, 2014; Noble, 2013; Rieder, 2012). These
studies range from interrogating algorithms within the institutional contexts of larger
economic and technology cultures to feminist critiques that explore how the opinions are
proliferated through the lens of a western white male perspective. Fuchs’ work has also
focused on the content creator for search engines and social media platform through a
political economic examination of how search engines and social media platforms utilize
user-created content and labor for their profits whereby a Marxist exploitation of surplus
18 Communication studies have also focused on the role of the user as activist and news generator and
social relations using social media (Khang et al., 2012). These studies are an important contribution to both
user creation and reception of content in the social media environment; however, they do not address the
optimization and retrieval strategies, which are the focus of this project.
16
value ensues (Fuchs, 2010, 2012a). Gerlitz and Helmond have contributed work
exploring the role of Facebook and social media in the larger online environment,
particularly the use of Open Graph metadata19 and relationships with the Facebook
“Like” button to demonstrate the economic ideology embedded in each user interaction
(Gerlitz & Helmond, 2013). Noble’s work is noteworthy for examining the type-ahead
feature in Google’s search box with both the limitations and the prejudices enhanced
through this search feature (Noble, 2013) and has been used as a call for more
government regulation of these technical gatekeepers.
Some research exists on the effects of mass SEO link building to receive a
particular search result to a Google query, known as Google Bombing (Bar-Ilan, 2007;
Tatum, 2005). This research, however, is not extensive and focuses primarily on
motivations. The limitations of these studies are in attributing the power in these actions
to the user alone. The studies do not provide a critical investigation in the socio-
technological and cultural structures that constrain and enable these activities. Without
understanding the broader institutional approach, Google Bombing becomes simply a
one-off act.
In the communications field, research on SEO and SMO is focused primarily in
advertising and public relations research focusing both SEO and SMO strategy and
studies on ROI (return on investment) and customer loyalty and perception (Berman &
Katona, 2013; Lipsman et al., 2012). SEO and SMO are also examined within the
19 Metadata is a set of descriptive data about another data or content source. Metadata can be descriptive,
technical, or administrative in functionality and applied manually or automatically to the original content
source.
17
computer science field where SEO is referred to as “adversarial web search” and is
treated as hostile to the algorithms and a problem to be fixed (Castillo, 2010; Malaga,
2008). On-page social media optimization, within computer sciences, is typically studied
within the context of a particular set of tools for the semantic web and network analysis
(Kinsella et al., 2011; Sizov, 2010).
This inquiry is limited by U.S. and English-language focused webpages and
manuals in order to continue the early studies on gatekeeping within the context of U.S.
politics and newspapers. This study is also primarily concerned with the premise of
communicating more widely and making content accessible amid the gatekeeping
technologies of search engine and social media sites online. Some countries around the
world govern the types and characteristics of content that can be returned in search
engine results or displayed on social media sites. Recent court cases in the United
Kingdom and the European Union have focused on aspects of privacy and the “right to be
forgotten,” which may specify the content that search engines are allowed to display
(European Comission, 2014). Additionally, countries such as China have been restricting
content for decades (Goldsmith & Wu, 2006). The U.S. does not at this time and
historically has not legally mandated what can be displayed in search and social media
results, which provides an unobstructed base for this analysis. A future study should seek
to examine techniques for search engine and social media optimization within these legal
and governing frameworks in non-U.S. contexts.
18
Dissertation Overview
The remainder of this dissertation is structured as follows: two background
chapters, CHAPTERS II and III; a communication systems and diagramming overview of
the technologies, CHAPTER IV; an overview of SEO and SMO strategies from
instruction manuals, CHAPTER V; a chapter focused on SEO and SMO in newspaper
articles online using the Los Angeles Times, CHAPTER VI; a chapter focused on SEO
and SMO in U.S. Senate political candidate websites on election issues, CHAPTER VII,
and a concluding chapter, CHAPTER VIII.
CHAPTER II reviews the interdisciplinary theoretical background of
communication and information system modeling, politics of information, and
gatekeeping studies to examine structural and institutional practices used to create,
inform, and react to optimization practices, as well as the sociotechnical systems for
communication and access to information. CHAPTER III provides an overview of the
methodology and reviews the use of and rationale for a media archaeological analysis as
the framework with a historical document analysis, as well as the selection of content for
analysis. This chapter also provides background on web archives and the process for
accessing the archived webpages needed for the analysis.
CHAPTER IV: The communication system environment for SEO and SMO
provides a diagraming analysis of the communication processes used in the context of
search and making information “accessible.” As topoi are explored in a media
archaeological analysis, part of the process is to place cultural phenomena in the context
and evolution of pre-existing media and mechanisms. This chapter provides the context
19
to be drawn on for the base of the inquiry into search engine and social media
optimization, as well as outlines of the processes used in the current technologies.
CHAPTERS V through VII present the major findings of the dissertation through
a historical method and document analysis employed in the media archaeological process.
CHAPTER IV: How-To Manuals and Instructions for SEO and SMO will use instruction
manuals and how-to guides for the strategies historically recommended to employ. The
optimization strategies will each be discussed based on changes over time and with
specific attention to the structure of HTML and the expertise needed to comply with the
strategies. CHAPTER V: News Stories use of SEO and SMO Strategies in the Los
Angeles Times will review archived HTML webpages of newspaper articles for the
presence of suggested SEO and SMO strategies. CHAPTER VI: U.S. Senate election
political candidate webpages will review election issue webpages in close election races
for the presence of suggested SEO and SMO strategies. These chapters will also address
points of success and failure in the adoption of optimization strategies and evidence for
points of transition in techniques.
The conclusion will review the topoi identified in the media archaeological
analysis and the impacts on writing and communication on the web. It will also compare
the expected outcomes from the instruction manuals with evidence found in news and
political candidate webpages. Recommendations for future study will also be addressed.
20
CHAPTER II
THEORETICAL FOUNDATIONS & LITERATURE REVIEW
This dissertation takes a critical historical approach to examine institutional
structures within communication practices of search engine and social media
optimization. To answer the research questions, this chapter provides a contextual and
theoretical review that draws on an interdisciplinary cross-section of theories from
communication, philosophy, sociology, and information science. Beginning with new
media / digital communication theory, grounded in foundational theories of
communication models, this section also provides a historical lens for the construction of
technical systems that form web communication technologies on which search engine
and social media optimization practices are enacted.
Following the discussion of new media and digital communication theory, this
chapter takes a critical overview through politics of information and critical code studies
to examine the institutional and power structures built into systems of information.
Gatekeeping then provides a framework for how media organizations have historically
shaped and provided access to information and what can be communicated. The chapter
concludes with a review of gatekeeping studies in the online environment that provide a
basis for understanding the power relationships that govern content exposure and access
to information in the digital environment and information retrieval systems. These
theories and background are important for understanding the historical context and
implications for how search engine and social media optimization practices are actualized
in historical and contemporary online environments.
21
Communication and Information Theory Models
In order to understand digital media as an object and a set of processes, it is
necessary to review early theories of communication technology models that provided a
basis for systems of communication, as well as theoretical models of communication
exchange. This section reviews the historical roots of new media theory through
information and communication models developed in post-WWII telecommunications
and cybernetics and concludes with communication models for the processing of
information from the Internet. The communication models of the mid-twentieth century
that were developed alongside early computing systems aimed at increasing the quality of
communication transmissions. They illustrate an emphasis on inputs and outputs for
information that is transmitted and received. As such, these models propagated a theory
of information as an object of transmission. The quality of the transmission for effective
communication was something that could be engineered and configured within the
constructs of the mathematical and engineered system. These theories laid the
groundwork for how digital information is perceived, valued, and regulated, as well as
the architecture that informed the initial building blocks of the Internet and World Wide
Web.
A Mathematical Model of Communication
During WWII in the U.S. and Great Britain, mathematicians, engineers, and
scientists worked together on war and anti-war technologies. One of the major research
and development institutions in the United States was Bell Labs in New York City, which
was the research hub for the Bell telephone system. During the Cold War, Bell Labs
22
continued to function as a hub for ground-breaking research with the increased federal
government funding for scientific research, which created an influx of new information
technologies (Rogers, 1997). Claude Shannon, a researcher in Bell Labs during these
periods, proposed a new model for communication publishing two papers, “A
Mathematical Theory of Communication” later combined into a book with introduction
by Warren Weaver that became the basis for modern digital systems (Nahin, 2013;
Rogers, 1997). For Shannon, communication systems and the reproduction of an accurate
message sent at one point and received in another occupied his research (Shannon, 1948).
He developed a one-way model of communication and information theory of sender and
receiver, which because of the easily generalizable categories and the introduction from
Weaver to expand uses of the model, was quickly adopted across disciplines as a way to
explore communication (Rogers, 1997).
Figure 2.1. Shannon’s Mathematical Model of Communication (Shannon, 1948).
This linear model of communication is a sender/receiver model where the goal of
the communication system is to avoid errors, reduce the “noise,” and produce as clear a
message as possible (Shannon, 1948). In order to accomplish his task of noise reduction,
23
Shannon introduces three concepts: 1) In Shannon’s model, he reduced information to a
binary set of 1’s and 0’s that could theoretically be transmitted along electrical current
(Shannon, 1948, p. 395). Although the terms bit and binary had been used previously
with systems, Shannon’s interpretation of 1’s and 0’s moving through the channel as
information was first introduced in a “Mathematical Theory of Communication” (Nahin,
2013). 2) The use of Boolean logic and error detection through redundancy is central to
the structure suggested by Shannon. This includes parity bit checking at the source end
and parity bit checking at the receiver end of the channel with the Boolean exclusive OR
(XOR)20 (Nahin, 2013). 3) Shannon also provides an encoding of language where a “27-
symbol ‘alphabet;” is used for English that adds the space as a character (Shannon, 1948).
This leads to his discussion of relative entropy and redundancy. As Shannon explains for
ordinary English, in words with eight letters or less, the chance for redundancy is roughly
fifty percent, “This means that when we write English half of what we write is
determined by the structure of the language and half is chosen freely” (Shannon, 1948).
The transmission of binary values in communication systems, exclusive OR gates for
error detection, and redundancy and relative entropy greatly influence the apparatuses of
future communication and information technology systems
As a mathematical model, Shannon was explicit in the separation of meaning
from communication. For Shannon, meaning is “irrelevant” to the problem of the
engineering system (Shannon, 1948). Shannon sought to reduce noise in his system,
20 The XOR presents a true result only when there is a difference in inputs is detected, i.e., one is true and
one is false.
24
which would more allow for the more accurate message to be received; however, what is
considered noise to the system becomes a fundamental question in modern information
retrieval and digital systems. The reliance of the structure of the system as agnostic to
meaning presents an interesting problem for new media studies. Tiziana Terranova argues
Shannon’s adaption of the information processing model in the system of communication
causes a crisis of the meaning of information, whereby a group like “conscientious
journalists” prioritize accuracy of information, but the engineer reduces information to a
ratio of signal to noise (Terranova, 2004). This is important for this study as there is a
clear tension of what constitutes the accuracy of information and where meaning is
derived within the structures of communication systems. This is manifested in how
search engines and social media platforms filter content and content creators use
optimization strategies to make their content accessible.
Cybernetics
Cybernetics is a second model of communication that is integral to understanding
the context of optimization strategies and algorithmic retrieval. The cybernetic model is a
continuously evolving model of a communication system that closely follows a biological
model (homeostasis) of learning and growth toward a more efficient process of
communication (Wiener, 1961). In order to grow, the cybernetic system processes
positive and negative feedback for its learning mechanism. Criticism of cybernetics point
to the a-priori nature of negative feedback for growth as a major practical limitation for
system design (Sutherland, 1975). The feedback requires an amount of self-recognition
for the systems of how messages exchanged between two or more units influence each
25
other (Rogers, 1997). In Norbert Wiener’s model of a cybernetic communication system,
the system was self-learning and adapted to refine the message and output. Within
cybernetics, because of the dependency on feedback in order to understand the
information theory, the apparatus for information flow must be examined (Guilbaud,
1959).
Like Shannon’s model, cybernetics does not consider the meaning or semantics of
the message. Cybernetics involves a series of probabilities and likely selections but is less
concerned with the accurate or correct message than Shannon. “What is of interest to our
theory is the choice, the range of possible messages” (Guilbaud, 1959). Wiener noted
that, in the case of information retrieval with large amounts of information, special effort
was required to make that information available that required a familiarity of previous
information for relevancy of any future retrievals (Wiener, 1961). Important for this
inquiry is the reliance on previous information and choice are foundational to early
models of search engine and social media algorithms. Through the ability to store
massive amounts of information in the memories of computing machines, Wiener saw a
way to use the outputs to do work in the world; for communication to benefit medicine
and mental health (Conway & Siegelman, 2009). Cybernetics’ description and ambition
for information retrieval is useful for understanding the tactics of SEO and SMO, as
search engines and social media platforms change the algorithms. From cybernetics, we
gain an understanding of communication technologies relying on negative feedback and
based on previous conditions of receiving the message with vast stores of information for
ranking and making visible content according to their conception of an evolving and
perfecting system.
26
Digital Communication Models and the Internet’s Foundations
Shannon and the work of Cyberneticians are often pointed to for the beginnings of
the Internet where their models of communication influenced the Internet’s design. The
structure of the communication between network nodes established the rules that online
communications begin to take place. The early development of ARPANET, which set the
foundation for the Internet, was a communication network between universities across the
United States and resulted from a combination of military and academic interests that,
although developed in the 1970’s, was not widely used by the public until the 1990’s
(Schröter, 2012, p. 302). In order to facilitate communication, protocols were defined for
communication across this network.
Although several alternatives were developed for communication across a global
network of computers that established the beginnings of the Internet, the TCP/IP
protocols designed by Vincent Cerf and Robert Kahn were adopted as the means for
global communication. In their model, communication is transferred from HOSTS, which
are composed of both source and destination computers, packet switchers and processes
for the information to travel defined within the HOSTS (Cerf & Kahn, 1974, p. 637).
Packet switching is used to enable information to travel in defined chunks and be
reconstituted at the receiving end of a communication network. Cerf and Kahn describe
communication processes between different networks through the use of GATEWAYS, which
enables the communication between different networks through agreed protocols (Cerf &
Kahn, 1974, p. 638). The introduction of gateways allowed for networks to maintain
their own local protocols but provide a way to transmit standard expected formats, e.g.,
through an internetwork header, intercepted at the gateway, which allows for
27
communication between networks. The TCP (transmission control program) handles the
processes of transmission at the level of the HOSTS; TCP enables breaking up of
information into processable chunks, with error checking and (e.g., checksum), and
reconstitution of messages at the receiving HOSTS. The IP allows for addressing of HOST
machines within the network (Fall & Stevens, 2011).
This TCP/IP protocol established by Cerf and Kahn set out the basis for
communication across the Internet; the flexibility of which parts of the standards are
communication networks that were necessary between machines in an external network
(i.e., Internet) vs. a local network were essential to the communication design system. As
the Internet developed into a network across global nodes, a layer-network approach was
adapted where different systems could implement different parts of the protocol but
essential features are shared between communications across the network.
Application Internet-compatible applications,
e.g., the Web (HTTP), DNS,
Transport Provides exchange of data
between “ports” managed by
application. May include error
and flow control (e.g., TCP)
Network (Adjunct) Unofficial layer that helps Network
accomplish setup, management Layer
and security of the network layer
(e.g., ICMP, IPsec)
Network Defines abstract datagrams and
provides routing (IP)
Link (Adjunct) Unofficial layer used to map Driver
addresses at the network to
those used at the link layer on
multi-access link-layer networks
(e.g., ARP)
Figure 2.2. Layered Network Architecture from (Fall & Stevens, 2011, p. 14).
28
ALL INTERNET DEVICES
HOSTS
The layered network protocols do not specify how to present information on the
Internet, however, and the World Wide Web provides the Internet with a way to
communicate through a presentation layer, e.g., web pages. The beginnings of web
development and protocols were defined by Tim Berners-Lee in initial definitions for
HTML (Hyper-text mark-up language). Berners-Lee was a researcher at CERN who
wanted to solve the problem of accessing and finding information on the Internet, and he
proposed a “universal linked information system” (Berners-Lee, 1989). Berners-Lee was
concerned primarily with staff turn-over at CERN and the loss of information from single
experts that couldn’t be shared with a wider community (Berners-Lee, 1989). The
history of HTML mark-up is based on these strategies for finding information,
communication between networks, and the focus on linking as central to the information
knowledge environment and communication between communities. As the global
network expanded, the World-Wide Web Consortium (W3C) was created to define and
manage the protocols of online communication on the Web. HTML (Hyper-text mark-up
language) was defined in documents on the early w3c.org site proposing a simple, yet
expandable set of tags (mark-up) for documents on the Web. As HTML2 was rolled out,
it became the standard for communication across the Web.
These early communication models and networking set the stage for
communication studies to investigate how communication occurs through this new
medium. The development of new media, digital, and Internet studies is a subfield of
communication and media studies that developed alongside the technologies that it seeks
to interrogate.
29
Critical Approaches to Understanding Communication and Information Models
In understanding these models of communication and information, this project
takes a critical approach that ties culture and technologies as interrelated and dependent
on each other. Contrary to a technological deterministic model, where technology is often
viewed as neutral, follows a sort of natural evolution, and the technology directly effects
society, in a critical view the elements of culture and society embedded within the
constructs and infrastructure of technologies (MacKenzie & Wajcman, 1999). Central to
the critical paradigm is the emphasis on social construction. This project examines the
implementations of technological strategies, in the forms of SEO and SMO, and looks for
linkages to prior media with the understanding that the technological cannot be separated
from the cultural.
Through Raymond Williams’ The Long Revolution, the role of changing
technologies is viewed from a historic and contextual perspective and the shaping as a
result of societal conditions toward a social construction rather than purely a
technological deterministic model of communication technologies (1961). In Williams’
discussion of technical changes in media, such as newspapers and books, both the
technological advances of the printing and presses and transportation via railway led to
increased distribution. However, the distribution cannot be viewed as separate from
actors in society and cultural processes. “A large part of the impetus to cheap periodical
publishing was the desire to control the development of working-class opinion, and in
this the observable shift from popular educational journals to family magazine (the latter
the immediate ancestors of the women’s magazines of our own time) is significant”
(Williams, 1961, p.56-57). In this example, Williams illustrates that the advances in
30
technology are not the only motivating factors of changes in periodical publishing. One
of the goals of this project is to provide an illustration of the technical changes so that
they can be further interrogated for cultural and societal influences and motivations.
Another important aspect of the critical approach is the possibility for change.
John Dewey saw the role of mass communication as a tool that could be used for
increased public participation and democratic ideals (1946). This approach is
characterized by a questioning and interpretive framework and also a sense of optimism
of change that could be possible through understanding, and in the case of Dewey,
pragmatic action. In Dewey’s view of communication, the act of conversation and inquiry
is a necessary part of communication; communication does not exist outside of the social
needs to communicate and opinions are formed only in discussion as part of active
community life (Carey, 1989, p.81).
From James Carey’s analyses of communication, we also the concept of ritual
communication in addition to and transmission of communications. In a ritual
communication environment, communication is embedded in institutions of society and
is continually re-inscribed yet adapts and evolves with periods of social change (Carey,
1989). Ritual communication as a socially constructed view should be explored for
significance and implications for communication. One of the needs for the social
construction and ritual communication view is the allowance for social change. Part of
the work in this examination is to examine the organizing principles of communication
and “to try to find out what other people are up to, or at least what they think they are up
to; to render transparent the concepts and purposes that guide their actions and render the
world coherent to them” (Carey, 1989, p.85). This project seeks to understand the
31
technical applications of SEO and SMO strategies and to identify the ritual re-inscriptions
from previous forms of communications, ways information is organized and exposed, in
order to identify places for change within the structures of online communication
practices. In examining the implementations of SEO and SMO, this project looks to
describe the “constellation of practices that enshrine and determine those ideas in a set of
technical and social forms” (Carey, 1989, p.86). The questions posed, in this project, seek
to first identify the practices as employed, in order that they may be further examined in
terms of culture and society.
Encoding / Decoding
Thirty years after the publication of Shannon’s Mathematical Theory of
Communication, Stuart Hall contends in his 1970’s essay “Encoding/Decoding” that a
problem with Shannon’s model is that it assumes an equality of conditions on both the
sending and receiving end of the message (Hall, 2006). Instead, he proposes that how the
meaning is interpreted , the accuracy and what constitutes “meaningful discourse”,
however, still depends on a set of conditions and codes that may not be the same at both
ends of the message. In Hall’s encoding/decoding model, the codes that affect meaning
include: frameworks of knowledge, relations of production, and technical infrastructure.
The institutional structures, networks of production, organization, and technologies, are
essential components to transmitting and receiving meaning from communications. The
model of communication, thereby, necessarily begins in a cultural frame to send the
message and is received in another cultural frame in order to be understand.
32
In examining the historical practice of search engine and social media
optimization, Hall’s model is especially useful to overlay the structural influences at the
ends of the sending and receiving models that aid to a critical historical investigation.
Hall contributed the idea of contextual interpretations for both the sending and receiving
of communication, which is embedded in the social constructs and conditions of
production on either end. In looking at online communications and interactive
communication technologies, cultural studies and the Hall’s model of encoding/decoding
allows important connections to better understand new media (Shaw, 2017). In this
project, the constructs and conditions are exposed for SEO and SMO strategies in the
hopes that the cultural influences can be explored to ask the question of who defines the
structures on what is good and how and what communications are accessible. “All
activity is not resistive, of course, but neither is it complicit” (Shaw, 2017, p.600).
Digital New Media Studies
Within the discipline of communication, digital new media theory has been used
to explore the contemporary communication technologies, including digital and internet
communication technologies, brought to the field (Morley, 2007; Silver, 2004; Sterne,
2005). The importance of “new media” in this study is to understand the roots of the
digital as object and apparatus that both enables and limits contemporary methods of
communication. The study of digital new media takes two main forms in the field of
communication. One approach focuses on the study of the Internet as a transformative or
transgressive medium, which allows for new forms of communication and interaction. An
alternative approach explores the transition of specific communication media to the
33
Internet as a parent medium, such as television, video games, multi-media art, news, and
advertising. Because of the interdisciplinary nature of communication, the definition of
what makes something new varies, and there is not a universally agreed upon definition
that permeates the field (Silver, 2004; Sterne, 2005). Where definitions of new media
have succeeded in the argument of newness, scholars have concentrated on characteristics
such as the ability for increased personal connection and social groups (Baym, 2010),
sociotechnical systems (Haraway, 1987), re-usability and re-mixing (Deuze, 2006;
Lessig, 2006), and values embedded in format (Sterne, 2012). Although new media has
emerged over the centuries from the telegraph to television,21 the crucial concept that
defines these as new, and that I fix my definition of new media on, is when the medium
elicits transformative views of reality and social practices. This type of newness is helpful
in investigating the role of search engine and social media optimization by examining
what makes the communication methods new and how that affects our understanding of
communication technologies in society. Examining the structural issues, the
conceptualization of communication via HTML documents and the interplay with search
engine and social media platforms to surface content allows for an in-depth look at
changing conceptualizations of communications provided by Internet technologies and
standards, as well as the aspects that persist through technologies, “topoi.”
“Topos” (topoi, plural) was originally developed by Ernst Robert Curtius for
literary studies and adapted by Erkki Huhtamo for media studies. Essential to the idea of
topos is that rather than emphasizing what is “new” with new media technology, topos
21 See (Williams, 1975)
34
present the recurring cultural formulae built into systems. It’s a way of looking at how
what is new is shaped by what is already known (Huhtamo & Parikka, 2011). Topoi are
useful for exposing the social and culture ways of knowing built into our systems that
replicate over new forms of media. In this project, HTML pages will be examined for
topoi that persist from previous forms of media and enforce functional attributes and
information access in online communications.
Hidden Mechanisms
Another unique aspect of the digital media environment and its newness which is
relevant to this study is the role of code and the digital medium. Code is sometimes
viewable, sometimes readable, sometimes not. Sometimes, it can be viewed by using
additional tools. Sometimes, the code is hidden by design of the program or, in the case
of HTML, for security purposes to prevent things like code injection by hackers into
JavaScript. Unlike a traditional print medium, where the code and technology in the inks
and paper may be visible and yet still unknown to the user, the code that underlies digital
media content may be completely hidden (Hayles, 2004; Kittler, 1995). Parts of the
hypertext environment could be viewed if source code is rendered in a browser, but that
is dependent on the exposure of the code as written (e.g., includes, database logic, and
additional scripts may not be viewable). The full HTML new media environment is
dependent upon a web browser on a hardware device to render the content. The hidden
values, constructs, and structural framework that are working in this interaction between
code, browser, and hardware are new in the digital media context.
35
In this dissertation, the code, within the HTML structure of webpages for
optimization techniques, is a central object of study not solely as a processing tool but
also to explore the embedded practices and ideas within the code framework. Critical
code studies within digital new media studies have a theoretical precept that asserts a
“performative, transformative, and mediating” function to code rather than merely an
instrumental function (Marino, 2006). Code in its structure, ordering, and rules is an
ideological expression that cannot be separated from the ideology in which it operates
(Marino, 2006) and is inseparable from its operational context in a capitalist economy
(Berry, 2011). The function of critical code studies is to make visible what has been made
invisible and to demonstrate its cultural significance (Berry, 2011; Kitchin & Dodge,
2011; Mackenzie, 2006). By critically examining the rules of code and structured content,
part of the invisible is made visible and can be evaluated for contributions to the overall
communication practices.
Within the structure of code and particularly relevant to the study of search
engine and social media optimization, is the role of algorithms in search engine and
social media platforms that expose and promote content. “All code, formally analyzed,
encapsulates an algorithm” (Mackenzie, 2006). Algorithms as a processing structure of
code are akin to code with embedded ideologies and act much like institutions with a
regulatory function on an individual’s behaviors (Napoli, 2014). This interplay between
code, algorithms, and how they are translated with hardware and software are points of
investigation of this project that must be examined in order to expose to socio-
technological infrastructure that shapes communication practices and access to
information.
36
In this dissertation, I am concerned with the structure of the elements in the
HTML, its roots, and its interaction with rendering applications and their algorithms. This
approach is content meaning agnostic and focuses on application and presence of codes
and values which are allowed within the HTML standard that can be used to promote and
manipulate the logic of the search engine and social media platforms code and
algorithms.
Search Strategies and the Networked Document
The hyperlink is an essential component within the study of new web media, as it
increases the remix of content, removes or adds contextual interpretations, and defines a
network of communication and relationships that exist both on a traditional theory of
society and network communication through knowing and relationships and an
algorithmic process by which scripts define the network and hyperlinked relationships
between communicative content. “[S]ome Web pages work as electronic
documents…while same pages more importantly point to document” (Gitelman, 2006, p.
128). This networked system of information with the use of hypertext, links, and efforts
toward creating the semantic web as envisioned by Tim Berners-Lee have the effect of
transitioning the Internet from a “Web of Documents” to a “Web of Data” (Park,
Jankowski, & Jones, 2011, p, 147). The relationships between documents in the online
environment provide an additional and novel approach to search that builds upon
structures of cataloging, categorization, and indexing and elevates the position of the
pointing document from previous media formats. The purposes between these different
37
types, electronic documents and pointer documents may not be distinguishable until the
page is examined.
Remix, Variability, and Mutability
Disintegration and remix are defining characteristics of digital new media digital
that meet the threshold for reconceptualization of communication. The ability to break
up, re-purpose, re-construct, and disable again in an efficient manner is a new feature of
the online digital environment. This is fundamental in the design of the communication
from the basic transmission of messages as defined in the packet switching protocols of
the Internet. The implications of deconstruction and remixing have affected how a digital
new media object is to be taken as a holistic object and re-conceptualized as a process
(Deuze, 2006; Hayles, 2004; Jenkins, 2004; Landow, 2006; Manovich, 2005). Copying
music and cultural media products in the online environment, and indeed the ease of
copying and making a near duplicate of an original in a digital environment, is a historic
change (Sterne, 2012). However, the more interesting question to me is how the copying
and remixing aspect affect not only the legal framework but how content is seen,
distributed and integrated into society as a communicative form and beyond the new
creative cultural works22 as lobbied by Lessig, Jenkins, and others.
Disintegration, remix, and linking has led to significant changes in how society
views communication practices where the content is no longer whole but is part of a
process and in the context of search results (search engines, data mining, and other
22 Early pre-digital examples of this include Two Live Crew lawsuit for copying music within one of their
releases.
38
activities): 1) it is inherently networked, and 2) exposed as part of algorithmically defined
mechanisms. Assumption of context and attribution may be faint or completing missing
in this new model of communication.
Another defining characteristic of new media for digital media content is the
transitory and unstable nature of digital media content. Communication in previous
contexts was typically either fixed (printed form, recorded, etc.) or transitory (speech on
the phone). Although not all fixed communication content was integrated into an archival
environment, the possibility was there.23 Digital media content is both fixed and
transitory. The ability to change content to modify and the tools needed to render content
all lead to a new conception of what it means for communication to be finished. What
does versioning look like in the digital environment? What is the “official message?”
These questions are even further complicated by the ability for digital communications to
provide tailored personalization of content. There are interesting questions for
communication studies about which content is delivered to users based on these
personalization measures (e.g., through newspaper home pages, search engine results,
and more). The technological ability also exists to deliver different versions of the same
content based on an individual user’s computer and/or browsing settings. What is
somewhat unique is that other than efforts to preserve web content by libraries and the
Internet Archive, there is no check and balance on the historicity of the content. This
problem was illustrated quite well by the George W. Bush administration and re-writing
23 Certainly, degradation of nitrate film, fire, and other hazards have affected the ability for traditionally
fixed content to be archived.
39
of White House webpages.24 New theories and methods need to be developed to deal
with the issue of fixed, yet variable communications. At what point are versions archived
and how does that version affect the interpretation of the archival record?
Politics of Information
What is “Politics?” and a Politics of Information
For the purposes of this project, I follow a definition of “politics” akin to James
Paul Gee in An Introduction to Discourse Analysis where that which is political is where
human relationships and actions affect how social goods are and should be distributed
(Gee, 1999), and information is the social good under investigation in this project. The
production of knowledge and share of information is a social and historical process that
precipitates a notion of public good (Fuchs, 2008). This project operates under the
assumption that communication of information is a social good. The tension and debate
within a politics of information focuses on the emancipatory and good process and this
controlling element of the good. A mechanism of control in the information economy is
the logic of the protocols that define, structure, and implement code (Galloway &
Thacker, 2007).
24 In response to that problem, the University of North Texas embarked on a major web archiving initiative
of government websites. Yet, they are only able to harvest part of the online government environment and
are an as a public institution in the state of Texas still subject to their state oversight.
40
Politics of Information Organization
As information is organized it becomes further integrated into political systems
that determine how and what information is available according to a specific model of
information organization. The organization of information becomes a necessity as
quantities of information increase. As a result of the increased amount and complexity of
printed information available over the past 200 years, which is too difficult for manual
searching, additional mechanisms have been implemented to help with searching (Bates,
2002). Prior to onset of the printing industry, the transition to formats that enabled
searching to find specific passages began with the book and vertical files (Gitelman,
2014; Vismann, 2008).
Information sets are organized to facilitate two kinds of access methods to the
content: browsing and search. Browsing is enabled in print media through a structured
product with sections or chapters and use of headings and layout in order for the reader to
quickly peruse, browse, and identify information to consume. In this way, the “typed
copy worked as a sort of natural language code” (Gitelman, 2014, p.70). Layout and font
choice are integral to allowing for browsing within a document or corpus. This system of
browsing in print media for information seeking and retrieval is usually limited, relying
on the existing terms within the content that may be augmented by design and layout to
“catch the eye” and are sometimes aided by a Table of Contents for quick location
finding. With the printing revolution, the constructed object became more standardized
(Febvre & Martin, 1976). The mechanisms that enable more direct searching include
both revisions of the format and the technique of indexing.
41
When collections of objects in these formats became too unwieldy, indexing was
employed to facilitate access. Indexing is the process of creating a short cut based on
identified terms (e.g., subjects, dates, and people) to enhance access to certain portions of
text or content organized in a taxonomy of set standards, terms and rules to facilitate
finding information. Index catalogs provide an index across multiple works or
collections. The earliest index catalogs of collections focused on personal, specific
professional, or specific institutional contexts. The onset of more generalized indexes
presented a transformation to a highly controlled political context (Krajewski, 2011). To
facilitate search in print media, a supplemental information organizational guide had to
be created. The resulting search guide takes the form of an index or catalog. Early
collections of documents may have been arranged chronologically; however, around
1500, the process of arranging documents according to subject was introduced (Vismann,
2008). Indices and catalogs provide an externally applied set of vocabularies or
taxonomies that define the subjects and ways of access. The function of an index to
select and retrieve information becomes a censoring device by the nature of the selection
of what and what not to include (Krapp, 2006).
The organization and selection of subjects and ordering is inherently political and
steeped in traditions of particular social groups and social forces (Ranganathan, 1973).
Subject categories also informed classification and ordering of materials in libraries and
archives, such as the Dewey Decimal System and Library of Congress Classification
System that made it easy to locate items on a shelf using a scheme of ordinal numbering.
Classification is “the process of translation of the name of a specific subject from a
natural language to a classificatory language” (Ranganathan, 1973, p. 31). The act of
42
classifying information and using taxonomies was the work of a skilled indexer or
librarian who was trained in the formal standardization provided by a classification
scheme. The effect of printing, democracy, and the availability of public libraries to the
masses resulted in new methods of ordering that were broadly accessible to the populace
and necessarily transparent (Krajewski, 2011; Ranganathan, 1973; Vismann, 2008).
Subject remains the dominant form of classification to print materials through the
development of electronic databases of information with a necessary transparency to
subject and arrangement taxonomies and schemas that facilitate finding information.
The limits of the effects of a particular social group and a critical example of
eschewing the politics of the organizational system can be seen in the work of Howard
University librarian Dorothy Porter in the 1930’s. Porter defied the Dewey Decimal
System, the organization system widely in use at the time and still today, which specifies
ordering and classification of books by subjects by creating systems that allowed for the
integration of materials by and about Blacks. “Against an information landscape that
exiled black readers and texts alike, Porter’s catalog was a site where radical taxonomy
met readerly desire” (Helton, 2019, p.101). Porter ordering of information was an act of
activism that redefined how and what information was made accessible.
Politics of ICTs (Information and Communications Technologies)
Instead of a set of organizational and institutional priorities collecting and
collating information, the Internet provides a vast networked and de-centralized
distribution mechanism for information. In this network, however, information must still
be collected and collated to make it findable. This is where an index, particularly for
43
search engines, provides the mechanism. This decentralized nature of the online
environment furthered speculation that the Internet could act as a tool for public good and
democratization free of the structures that defined and restricted access to information in
earlier forms of media. Many scholars, however, point to the neoliberal ideologies and
current systems of capitalist control only enhanced by online communications (Castells,
2000; Dean, 2009) and dominated by a few mega ICT companies. Technological
development and online communication have helped the formation of an information
economy where the key commodity of exchange is control. Wendy Chun points to a
dualism of the Internet as a tool of freedom and a “dark machine of control” (Chun,
2006). In this environment, which Terranova describes as “informational milieu”,
political intervention is only possible with an engagement of distribution and access to
information such as “opening up channels, selective targeting, making transversal
connections” (Terranova, 2004). The politics of information frame the base for
investigating the institutions that control commodities for the public good and regulate
access to information.
Politics of Search Engines and Social Media Platforms
Search engines and social media platforms are a central locus of the institutions of
control in the online environment. Google and Facebook have fought against their
characterization as media companies in the political discourse in the U.S., minimizing the
perception of them as institutions in need of policy intervention and regulation (Napoli,
2014). The success of these services is that they “have become indispensable to the
political economy of citation indexes, online public relations and marketing, knowledge
44
production, and NGO advocacy activities” (Franklin, 2013). Political discourse without
exposure and discovery through these services can make access to information and
communication of ideas a tricky business.
In investigations of the political role of search engines and social media platforms
and looking at the role of the algorithms used in these processes, the algorithms cannot be
examined without understanding of the social and cultural context of their creation.
“Algorithms are inevitably modelled on visions of the social world, and with outcomes in
mind, outcomes influenced by commercial or other interests and agendas” (Beer, 2017,
p.4). It is important to understand the way that decisions are made to render the content
via search engines and social media platforms are influenced by the algorithms and how
the content creators attempt to subvert, play the game or manipulate the algorithms for
access to their content. “The power of algorithms here is in their ability to make choices
to classify, to sort, to order, and to rank. That is, to decide what matters and to decide
what should be most visible” (Beer, 2017, p.6). In this project, the role of SEO and SMO
strategies are examined in conversation with the algorithms that provide or prevent
opportunities for access to information. Though code is examined and technical
strategies, those strategies should be thought of in context of the social conditions and
structures in which they were created and enacted.
Gatekeeping
This dissertation uses Gatekeeping theory to surface the role that various
structures, norms, and subjectivity play in preventing and allowing access to information
in the online environment and how those mechanisms may be subverted by an individual
45
or organization through the strategies of search engine and social media optimization. In
this analysis, the historic role of gatekeeping through various media are important for
understanding how gatekeeping may function in both similar and different ways from the
Internet and online documentation.
Gatekeeping and Mass Media
The communication theory approach of gatekeeping began with David Manning
White’s study of the choices for a newspaper story from its inception of what was
newsworthy to the decision to print and distribute through a chain of gatekeepers (White,
1950). It is through many decision points in the chain that determine what information is
communicated and considered newsworthy. In this landmark study for gatekeeping,
White emphasized that it is when looking at the stories that are rejected by a newspaper
editor for printing that the subjectiveness of the decision-making process is revealed and
the emphasis on the experiences of that gatekeeper are dominant in the gatekeeping
process (White, 1950, p. 386). The newspaper editor is the “terminal gatekeeper” for
White, as the person who ultimately decides what information is available to the broader
public. As part of White’s argument, he discusses the premise from psychology that
“people tend to perceive as true only those happenings which fit into their own beliefs
concerning what is likely to happen” (White, 1950, p. 390). This concept is foundational
to a role of gatekeeping not only as a means for determining information available but
noting that that process is inherently biased and coded within the norms of expectations
from the gatekeepers that are making those decisions.
46
Subsequent studies of mass media and gatekeeping have refined gatekeeping
models of “agenda-setting,” noting the impact of gatekeeping in mass media to set the
political agenda. In McCombs and Shaw’s study of the 1968 presidential campaign, new
content, and voters, they identified correlations between the issues of importance to
voters and those emphasized in the news media (McCombs & Shaw, 1972).25 In this
study, they explored how agenda-setting could have an important influence on the social
and political spheres. Following these defining studies for gatekeeping, communication
scholars have further investigated the role of gatekeepers across mass media and
expanded to identify the influence of the organizational impact on the gatekeepers. The
organization functions within “input-output relationships” with its environment
(Dimmick, 1974, p.2).
As a result of these studies on gatekeeping and the role of knowledge production,
one part of remediation suggested is to assert a separation from the producers of content
and disseminators in order to reduce the impact of organizations on mass-media
gatekeepers (Hirsch, 1972). As we move into the online environment, that separation may
be more striking than traditional mass media; however, the balance of information
available may sway less favorably for society after all.
25 An interesting comment form McCombs and Shaw that bears note for the study in the online
environment is that the values of readers and news producers are strikingly different (McCombs & Shaw,
1972, p. 185).
47
Gatekeeping Online
One of the earliest problems identified with the Internet was that there was so
much information that finding a specific piece of information became problematic
(Berners-Lee, 1989). Early attempts at categorizing the web by Yahoo and even the
Librarian’s Index to the Internet attempted to reproduce historic archival indexing within
the online space. Some strategies, such as the use of tags for categorization and
information management were adopted by both search engines and social media
applications. The digital world, however, also made it easier for full text indexing, and
these services are not limited by the space of time needed for such activities as
historically were required by print.
Gatekeeping through Search Engines
In recent years, the influence of search engines as a gatekeeper has been a
frequent object of studies. Search engines have a basic function where they crawl the
World Wide Web and index webpages based on the content of the HTML where they
then use proprietary algorithms to rank the search results by relevance. There are many
strategies and tools to make this easier for search engines. As discussed in the
introduction, elimination from a search engine can, in effect, make that information
inaccessible, including submitting URLs to search engines for indexing. Search not only
limits to a subset of information, but it also functions like institutions and sets the criteria
for information seeking by individuals (Napoli, 2014). The criteria that are used for
relevancy of rankings is then pivotal in the gatekeeping function for search engines and
48
has resulted in some scholars calling for a public demand to release the algorithms for
transparency (Introna & Nissenbaum, 2007).
Gatekeeping through Curation of Links in Online News
Another important angle of gatekeeping in the online environment is the role of
content creation and selection for integration within online pages themselves.
Online journalism can be functionally differentiated from other kinds of
journalism…The online journalist has to make decisions as to which media
format or formats best convey a certain story (multimediality), consider options
for public to respond, interact or even customize certain stories (interactivity), and
think about ways to connect the story to other stories, archives, resources and so
forth through hyperlinks (hypertextuality) (Deuze, 2003, p. 206).
Even in the online environment, these activities are not dissimilar from the role of the
editor in what get printed in the newspaper page except the decision is now how to relate
and or not relate other content online with your content. In studies of newspapers online,
the practical concerns of longevity of links, the authority / trust in content, and
competition with other outlets may limit the use of linking to online content (Cui & Liu,
2017). The attitudes of journalists toward linking and what to include or not within the
webpage content are aligned with “classic journalistic principles” (De Maeyer, 2012).
When news sites have used links to sources, those links are “directed toward sources that
were within mainstream media (often internal), political neutral, undated, and reference-
based” (Coddington, 2012, p. 2020). As the hyperlink is one of the defining
characteristics in digital new media, the extent to which journalists consider linking
strategies or incorporate the practice or not, points to how disruptive or not online
communications practices have been for news media. The motivation to include links
49
within traditional journalism in online environment is focused on providing context for
curious readers, and there is general agreement that it is a good practice to inform
readers; however, it is not typically employed (Coddington, 2012; De Maeyer, 2012).
This also outlines a tension in how the decisions of gatekeeping within a webpage may
have a direct impact with the other gatekeepers on the web, through search engines and
social media platforms, and control access to content.
Consistent with these attitudes, in looking at the evidence of linking within news
articles, studies of the activities of journalists have shown little time in the journalists
work spent on considering or curating links (De Maeyer, 2012; De Maeyer & Holton,
2016). “The confrontation of the bright theoretical promises usually related to hyperlinks
and the more nuanced picture showed by empirical research about the actual linking
behaviour of news sites underlines a stimulating gap” (De Maeyer, 2012). One of the
transitions to within online journalism also requires a recognition and acceptance that
“[journalism] it does not function as sole provider of content” (Deuze, 2003, p. 218).
Where and to whom the power of gatekeeping information online is more nuanced an
complicated in the layering of gates, in order to make content accessible.
One of the goals of this project is to identify technical areas of consideration that
should be included into the communication discourse and creation process as essential to
providing access to information and where considerations of gatekeeping can be further
interrogated. These questions of connecting content online are not unique to news media.
News media provides a unique look into the practices of connecting to other sites In this
project, the role and decision of curating links, as an SEO and SMO strategy, is a way of
influencing gatekeeping online; however, the act of curating the links is a gatekeeping
50
function itself, as well. As content is created for the online environment, these various
layers of gatekeeping should be kept in mind and be part of the decision-making of the
content creation process for online communications.
Gatekeeping through Social Media
Social media platforms are inherently different than search engines in that,
although they also may link to external content, they are also fully independent
applications where content is largely user-created and that information is fed back into
the application. The application could theoretically exist without external content. As
search engines and social media platforms began to act as an intermediary between the
media and other content creators and readership, the role of gatekeeping switched to one
of regulation through algorithms and code.
Research studies of the American public’s behavior in seeking information online
and sources of online access increasingly point to an increased percentage of people
finding their information online.26 The perception of the role of the gatekeeper is
important in identifying the importance of the gatekeeping and to help expose the
invisible actors (human and machine) at this level of gatekeeping. A 2012 Pew Research
Center Study found that 2/3 of adults said “search engines are a fair and unbiased source
of information” (Purcell et al., 2012). The public perception of search engines is that they
are neutral (Pan et al., 2007). Although social media platforms do not have the same
perception of neutrality, most users believe that they are receiving content in their feed
26 See several studies from the Pew Research Center: (American Feel Better Informed Thanks to the
Internet, 2014, Internet Use Over Time, 2014, State of the News Media 2015, 2015)
51
based on friend’s recommendations and neutral algorithms functioning on factual and
non-ideological data (Light & McGrath, 2010). The gatekeeping role is essential to the
argument of this dissertation where competing ideologies are at work to expose and
provide access to information, so these institutional structures and behaviors will be
examined.
52
CHAPTER III
METHODOLOGY
There are many challenges with writing a contemporary history of online
communications, and the methodological approach must account for the influence of the
structure of the presentation. Gitelman asks, “How is doing a history of the World Wide
Web, for instance, already structured by the web itself?” (Gitelman, 2006). This chapter
is organized into four sections and reviews strategies and processes for studying
webpages as historic online documents. The first section will review the research
questions that drive this project. The second section will present both the methodological
framework and an overview of strategies of a media archaeological analysis in the
context of analyzing SEO and SMO. The third section will discuss the method of
historical document analysis applied in the media archaeology framework. The final
section will review the selection, collection and analysis procedures employed to answer
the research questions.
Research Questions
Previous critical history research on the gatekeeping function of search engines
and social media platforms focuses on the notion of the hidden and proprietary
algorithms that are used to determine content exposure and have critiqued the results of
these algorithms as the algorithms themselves remain hidden. These studies often call for
an emancipatory transparency of the algorithms; however, the impetus for the
corporations to employ transparency is unknown, and the likelihood of government
regulation is even further unknown, as deregulation of corporate America has been the
53
ongoing trend for some time. What incentive is there for companies to expose these
algorithms? Computer science research also focuses on adaptations of search engine and
social media platform algorithms in order to prevent the use of “adversarial strategies”
that seek to jump the gates of the search engine and social media platforms. These
computer science technical research projects are aimed at perfecting the gatekeeping
function. Advertising and marketing materials address SEO and SMO for a practical
application, but rarely look critically at the usage over time and interplay with search
engines. This project is focused on the interaction of SEO and SMO strategies and looks
at opportunities to influence with SEO and SMO due to the structure of the web and
online content. By investigating the role of SEO and SMO, this project seeks to identify a
history of this interaction between platform gatekeeping algorithms and the SEO and
SMO strategies, in order to provide attainable outcomes. This study is guided by the
following research questions:
RQ1: What is the historical development of SEO and SMO strategies?
RQ1a: What are the topoi in these practices?
RQ1b: What is the interplay with changes in proprietary algorithms over
time?
RQ2: How has the development of SEO and SMO strategies been actualized in
HTML practices for major persuasive information industries?
RQ2a: How have the strategies been implemented in newspapers’ online
presence?
RQ2b: How have the strategies been implemented in political candidate
websites?
RQ3: How have SEO and SMO strategies shaped communication online?
54
The first research question situates the SEO and SMO strategies in context over
time by looking at technical guidebooks and manuals with instructional content for the
SEO and SMO strategies. To address this question, a historical descriptive study of the
strategies is used. It also involves looking at the relationships and interplay between the
strategies for SEO and SMO and the changes in search engine algorithms over time. In
addition, this historical overview provides a reflection on how these strategies do or do
not differ from previous strategies in other media formats for selecting and making
information accessible. Are there topoi present that can be identified as indifferent to the
media of the web and HTML structures?
The second research question takes the distilled strategies from the first question
and explores how these strategies were implemented in webpages in major persuasive
industries. In the media archaeological analysis used in this project, it is important to
examine the structure and usage of coding. In order to identify the norms enforced in the
technology, HTML pages from industries where the everyday impact of availability of
information is essential to the existence and mission of those domains are explored. The
major persuasive industries of the newspaper articles and political candidate webpages
were selected for examination. In communication studies, newspapers and candidate
platforms have historically been assessed as newspapers as the gatekeepers to candidate
platforms in the print medium and selection of news stories (Cui & Liu, 2017; De Maeyer
& Holton, 2016; Fink & Schudson, 2014; White, 1950). Both news media and political
platform candidate pages have been widely examined in online communication practices,
and the role of search engines in gatekeeping content (Ali et al., 2019; De Maeyer &
Holton, 2016; Diakopoulos, 2015; Nechushtai & Lewis, 2019). This complementary
55
technical and media archaeological analysis provides further tools for decision-making by
producers of news and political content when and what strategies to incorporate on SEO
and SMO and the pluses and minuses of the strategies coupled with other practices and or
author intent.
In addition, by focusing on newspaper article and political candidate webpages,
these two sectors present an opportunity to explore communication practices, which
engaged in early web activity. Because of the early engagement in web activity, archived
versions of this content over time have been captured and are available for analysis of
practices over time. By examining archived webpages, the goal of this question is to
validate the strategies found in the first question and understand the implementation and
changes in SEO and SMO strategies over time.
The final research question explores how SEO and SMO strategies within the
HTML structure and page content affects communication strategies in an online
environment through the examination of newspaper article and political candidate
webpages. This question does not delve into writing for the web strategies27 as a whole,
focusing instead on the SEO and SMO strategies and how their implementation may or
may not change communication practices in newspaper article and political candidate
webpages. This examination will reveal a snapshot of how actualized strategies of SEO
and SMO may have influenced communication practices.
27 Writing for the web is a set of strategies focused on the usability, tone, use of white space, and
recommendation such as using “you” and “we” in text (Assistant Secretary for Public Affairs, 2016).
56
As in many historical studies, the research questions began as a “guided entry”
and are refined as the study progresses and discoveries occur within the investigative
process (Smith, 1981, p. 307). A historical method qualitative approach was used to
address these questions with document analysis. Rather than merely a descriptive
exploration, these questions attempt to explore the important relationships that are
involved in communicating online and the complexities of communicating through the
gatekeeping technologies of search engines and social media platforms.
To answer these research questions, two media types are used. The first set of
media will be instructional guides and how-to manuals books centered on applying SMO
and SEO strategies in HTML. A historical document analysis will be used to examine
these sources and the recommendations they assert. These books are limited to
professionally published print books and do not include self-published books. The second
set of media are archived webpages, accessed through the Internet Archive’s Wayback
Machine and the Library of Congress web archives collections.
Methodological Approach: Applying a Media Archaeological Method
This dissertation presents a recent history of contemporary communication and
technology practices with HTML webpages and their exposure through search engines
and social media platforms. In order to perform a historical analysis for a contemporary
practice, this dissertation employs a historical media archaeology approach. Media
archaeology is particularly suited for this investigation by employing a techno-cultural
approach and that is both “self-reflexive” and examines media as “archival objects of
research” (Ernst, 2005, p. 587) and aspires to expose the invisible and alternate histories
57
to make sense of digital media and the political and cultural institutional contexts of the
present (Parikka, 2012). The media archaeology approach exposes the invisible rules and
structures that serve the discourses available through the content results and carries an
important cultural effect. A media archaeology investigation, as expressed by Foucault,
differs from a traditional historical analysis or a strictly textual analysis by:
1) examining rules of discourse rather than the content of the discourse,
2) defining the specificity of discourses and the rules they enact,
3) not championing a creative subject, and
4) not attempting to recreate intent but rather is “systematic description of a
discourse-object” (Foucault, 1972, p. 140).
For this project, the place of the media within the discourse and the processes that
both frame the communication and the creation of HTML documents becomes the object
of study rather than the discourses, content analysis, or effects of the communication.
This dissertation adheres to an approach that emphasized the functionality of the
technical architecture, operations, and processes that exist within cultural norms and
powers relations for a specific medium.
A media archaeology analysis provides insight and contextual bearing on the
question of whether a cultural practice was an effect of new media or the new media was
created because of the “epistemological setting of the age demanded them” (Ernst, 2005,
p. 587). In the revealing of the contexts and architectural frameworks and operations, one
of the tasks of the media archaeology project is to interrogate newness and look for what
is already known and the recurring cultural formulae, “topoi,” that permeate the media
apparatus (Huhtamo & Parikka, 2011). This dissertation also seeks to identify those
topoi in the development of optimization practices and access to information.
58
As a critical method, media archaeology adopts many practices from discourse
analysis but focuses on structures and materiality over content (Foucault, 1972).
Operationally, this means that the structural rules are examined in much the same way as
discourses and how the rules of discourses are embedded in the media (Parikka, 2012).
Many of the traditional components of a qualitative discourse analysis are used in media
archaeology and historical method best practices should be followed for a rigorous study.
A media archaeology analysis should be empirical, systematic, and rigorous. Specific,
precise, and thoughtful decisions need to be made in selection of content for analysis and
examinations. An essential component of a media archaeological analysis is an
infrastructure approach to the investigation rather than an interpretation (Ernst, 2013;
Foucault, 1972; Kittler, 1995).
In a traditional historical document analysis, the identification, authentication, and
verification of documents is essential to an empirical study (Scott, 1990). These concepts
are complicated in the digital environment. Where once things like handwriting and type
of paper could be used to authenticate documents, the digital does not have such
affordance and documents are continually made new and take the form of different
representations (Brügger, 2012). Internet studies that utilize the internet in the archival
historical search must also be conscious that the act of doing that historical work is
structured by itself (Gitelman, 2006). Throughout the data gathering and analysis for this
dissertation, a recognition of utilizing search while studying search will be
acknowledged.
The media archeological analysis uses a document analysis but at the level of the
discourse-object and its materiality. The digital media discourse-object has a layered
59
materiality consisting of multiple components that make up the transitory and variable
digital object (Parikka 2012). In a traditional media archaeological analysis, the
materiality is examined in terms of the physicality of the structural components that
compose a medium. An important component of the structural materiality is how the
pieces connect and function together. In discussion of the role of the first microprocessor
from Intel, Kittler notes, “…computing, whether one by men or by machines, can be
formalized as a countable set of instructions operating on an infinitely long paper band
and the discrete sign thereon” (Kittler, 1995, p.148). On top of this formalized set of
instructions for the hardware, software creates another layer of hidden instructions upon
the media, and yet it is reliant on the hardware and to materiality of the components to
which “are built into silicon and thus form part of the hardware” (Kittler, 1995, p. 150).
The materiality of the computing system is part of the hidden mechanisms to be exposed
through a media archaeological analysis.
In this project, instead of focusing on the materiality of the components of the
proliferation of hardware and software devices that exist to render HTML content, the
layers of hidden mechanisms to be exposed focus on the functionality and the processes
that occur between the HTML, SEO and SMO strategies, and search engines or social
media platforms. Beyond a counting or basic content document analysis, the media
archaeological analysis explores the references, functionality, and intertextuality of the
documents in context. It does not search for meaning or intent, and in examining through
the relationship with search engines and social media platform, the text in action becomes
the focus. The implication in this type of analysis is that the technical is inherently social
and cultural and built on structures that create and reinforce power relations, and in the
60
case of this project, access to information. The media archaeological analysis, in this
project asks, what is it that actors are doing with the words [and code] (Prior, 2003)? The
important conception of this work is not the intent of the coding but the effect that the
code has on the access to information through the gatekeeping applications of search
engine and social media platforms. The choices in the coding rely on the sociotechnical
structures within HTML and online communications. In addition, the decisions of the
choices in application and use of code create pathways and barriers that are part of a
larger sociocultural consumption of information and reception of communication.
This project also attempts a critical history. There is an opportunity to identify
guidance for future use and possibilities of change (Winthrop-Young & Wutz, 1999). By
using a media archaeology approach, the intent is to dissect the systematic and structural
attributes that form as gatekeepers to information and to identify opportunities for
influence the accessibility of information, as well as to identify places for activism within
the structural forces that determine the standards and allowances of communication
methods online.
Historical Documents
With a media archaeology approach, a historical research is the primary
investigation tool for media artifacts (primary sources). In this project, historical
documents will be the basis of the investigation. In using historical documents in
research, one of the most critical aims is to provide evidence that is not selected merely to
prove an existing conclusion. “[B]y examining document content in terms of a strictly
defined set of procedures, researches can produce robust and reliable conclusions” (Prior,
61
2003, p.149). This study will examine the HTML code of archived webpages for
evidence of SEO and SMO strategies embedded in the code. To provide context and
inform the identification of topoi, instruction manuals for SEO and SMO strategies will
inform the document analysis of the archived webpages.
Use of Instruction Manuals
Instruction manuals are how-to guides examined to provide context and
comparison between the recommended strategies in the manuals and the actualization of
strategies in the webpages later examined. Manuals present an illustration of standards
and practices, which make it possible to identify complete relevant strategies (Prior,
2003, p. 151). In the study of web communications and technology, manuals and how-to-
books are not expected to be representative of how a typical user may implement tactics
and strategies but rather well-thought-out and intentional articulation of strategies based
on author expertise (Owens, 2015, p. 33). Because manuals and how-to guides are also
written in order to develop a skill, the tactics and strategies outlined in the texts can be
extracted but also need to be matched with an audience and goals.
Use of Archived Web Documents
Studying historic web documents provides particular challenges in that the pages
may change over time and different versions may be captured. The device and hardware
used to render the content may not be available or the particular organization of content
could be constructed based on personalized information (Park et al., 2011, p. 286).
Studying the web document is particularly a problem in archived web pages where a
reconstructed and incomplete copy is viewable, because all code used to render the page
62
and media content may not have been captured in an archival harvesting process
(Brügger, 2012). The use of web documents in this study is advantageous and should not
encounter that issue, as the code needed for SEO and SMO strategies must be embedded
in the rendered HTML in order for web crawlers to harvest the pages properly. Although
some code may be written as part of an include or script, it is rendered viewable in the
HTML. The use of “View Source” web browser function and “save as HTML” or “save
as Complete Webpage” allow the HTML for SEO and SMO strategies encoded to
become visible. In this project, archived webpages accessed through the Internet
Archive’s Wayback Machine and Library of Congress web archives collections will use
the built in Chrome Browser function, “Save Page As…Webpage, Complete” to capture
pages for analysis. This results in capturing several files associated with the webpage in
an accompanying folder.
Figure 3.1. Example “Save Page As…Webpage, Complete” artifacts.
63
Adobe Dreamweaver was used to examine the HTML code. Dreamweaver was used for
the following three reasons, 1) it automatically detects and inserts accompanying files in
the view, 2) it allows for a split view of HTML code and a sample rendered page through
a browser at the same time, and 3) it has quick code find, replace, and strip tools that
allowed for efficient removal of additional HTML inserted by the web harvesting tools.
Both the Internet Archive’s Wayback and the Library of Congress’ web archives
collections use a version of the Open Wayback Tool28 to harvest webpages, which inserts
clear notifications of non-native HTML added the pages that are added when the
webpages are viewed through the online web archives:
……
Figure 3.2. Example comments surrounding HTML code inserted in Wayback
applications.
28 https://github.com/iipc/openwayback
64
Additionally, the Wayback application adds path directory code for image and internal
webpages, in order to point to the archived versions and not seek the live web for these
artifacts.
background-
image:url(https://webarchive.loc.gov/all/20161011232019i
m_/https://drjoeheck.com/wp-content/uploads/2015/08/az-
subtle.png) ;
Meet Joe
Figure 3.3. Example directional code inserted by Wayback application to direct to
archived versions of referenced files.
Because this process retains the original link after the inserted archival web location link,
it does not affect the interpretation. It is important to look at the files archived at the time
of the web harvest instead of current live links, in order to imitate the original
presentation. When links are broken and/or images are missing, often the web harvester
application was not able to harvest those files and add to the directory of accompanying
files. The code used to point to those missing files, however, are not altered and the
original code is able to be examined even if the design is not.
Interpretation as a part of historical analysis must involve five types of control: 1)
evaluation of sources; 2) context; 3) historiographical changes over time; 4) generalizing
with concrete and specific evidence; 5) self-awareness to minimize bias (Startt & Sloan,
1989, p. 146-47). The following Data Collection and Analysis section will review these
controls for each set of sources examined in this project.
65
Data Collection and Analysis
Before addressing the specific data collection and analyses used in this project, I
will address the control of self-awareness and minimizing bias control for validity
concerns. As a professional librarian, my job is about connecting users to information.
Search and discovery is fundamental to this process, and I have taught credit level
university courses on web design. I became a librarian when the Internet was still fairly
new as a widespread tool, in the early aughts, and Google was only a couple of years old.
In my graduate program, we spent significant time reviewing the Librarian Index to the
Internet, which was a manually curated site of links to “reputable” sites on the Internet.
Yahoo’s categorized home page was in a rivalry with Google’s single search box at that
time. We also were trained in DIALOG search procedures, which was an aggregated
pay-by-search query tool for scholarly articles that was developed in the mid 1960’s. I
taught myself how to be a web application programmer with several O’Reilly manuals
and IT resources provided by the University of Washington. My experience in web
application development and library science aligns me with a typical or advanced user for
most the manuals examined. I am also able to easily read and understand some of the
more technical materials and quickly parse HTML and/or use tools for identification of
tags and structure. I can fully read and understand the source code of the HTML archived
pages without needing to render through a browser or the rendering preview tools
provided in Dreamweaver.
66
Instruction Manuals and How-to Guides
The selection of manuals and how-to guides was “purposive and pragmatic”
(Prior, 2003). In order to identify relevant texts, I searched for “search engine
optimization” and “social media optimization” in Worldcat.org; the international catalog
of library holdings. These terms are not official subject terms for library uses, as is often
the case with newer concepts, so I used the subject listings in the records of the first
couple of identified texts to further identify texts and continued this pattern. Subject
headings for the books in the analysis included: Internet Marketing, Social Media, Web
Search Engines, Internet Searching, Search Engines, Electronic Information Resource
Searching and Electronic Commerce.
In order to identify texts with particular influence, I selected texts that had over
200 holdings in libraries worldwide. This number is a bit tricky, in that, in an attempt to
combine language and similar editions (e-formats and print editions), WorldCat has
combined some of these together. For the purposes of this project and looking for relative
popularity and spread of titles, that limitation does not affect the general spread of the
text. Another limitation in holdings for manuals and guidebooks is that outdated texts are
usually withdrawn from library collections. In order to address this problem, I did not use
the 200 holding threshold for books published in 2011 or earlier. In these instances, the
availability of the texts affected my ability to include it in this project, and primarily texts
were selected based on their availability for analysis. As an additional mark of popularity
of the texts, I also looked each up on Amazon.com and noted the number of ratings for
each text. These also may be problematic numbers, as I did not filter out paid or robot
67
ratings for each text. This measure was employed as a check and understanding of
popularity and not a selection factor. The most popular texts on Amazon.com for search
engine optimization and social media optimization are self-published texts promoted
specifically by Amazon. I decided not to use these texts and focus on those with
established publishers as a criterion for quality. It would be interesting, in a future study,
to see what differences, if any, exist between the manuals published by known
technology publishers and the self-published titles that are highly used from
Amazon.com. I identified 15 manuals spanning publication from 2005 to 2018 for the
analysis. A little less than half of the titles are published by John, Wiley & Sons. This is a
high percentage of the titles due to both the focus of John, Wiley, & Sons on producing
technical manuals and their acquisition of many technical publishers.29 Table 3.1 presents
the manuals selected for the analysis and includes the number of Amazon Ratings and
WorldCat holdings, as well as the target audience, which was identified during the
examination of the texts.
Each manual was around 200 pages. For this historical document analysis, I used
an iterative approach consisting of skimming and then close readings of the texts. The
initial texts used to create the sample data collection sheet were: The ABC of SEO:
Search Engine Optimization Strategies (2005), Search Engine Optimization Bible (2009),
and Introducing SEO: Your Quick-Start Guide to Effective SEO Practices (2016). During
the data collection and analysis, I refined the data I gathered from the texts, adding more
depth to the questions about the presentation and strategies related to Black Hat
29 See: https://www.wiley-vch.de/en/about-wiley/the-publishing-house
68
Table 3.1. Chronological Listing of How-to Guides and Instruction Manuals.
Title Publisher Year # of WorldCat Target Audience
Amazon Holdings
Ratings
Digital Branding A Kogan Page 2018 16 282 Marketing
Complete Step-by- professionals;
step Guide to public relations
Strategy, Tactics, professionals
Tools and
Measurement
Introduction to 2017 5 324 Interns; college
Search Engine A Press students; self-
Optimization: A paced learners;
Guide for Absolute journalists
Beginners
Introducing SEO: A Press 2016 1 302 Web designers;
Your Quick-Start website managers
Guide to Effective
SEO Practices
Win the Game of John Wiley 2015 62 606 Small businesses
Googleopoly: & Sons /
Unlocking the Skillsoft
Secret Strategy of
Search Engines
Search Engine IBM Press 2015 38 218 Marketing
Marketing, Inc: professionals
Driving Search
Traffic to Your
Company's Web
Site
Social Media John Wiley 2015 15 621 Marketing
Optimization for & Sons professionals
Dummies
Letting Go of the Morgan 2014 109 1718 Marketing
Words: Writing Kaufmann professionals;
Web Content That imprint of graduate students;
Works Elsevier technical writers
Search Engine John Wiley 2013 45 1332 Marketing
Optimization: Your & Sons professionals
Visual Blueprint for
Effective Internet
Marketing
Optimize How to John Wiley 2012 59 660 Marketing
Attract and & Sons professionals;
Engage More public relations
Customers by professionals; small
Integrating SEO, to medium sized
Social Media, and business owners;
Content Marketing large company
marketing
executives
69
Table 3.1. (continued).
Title Publisher Year # of WorldCat Target Audience
Amazon Holdings
Ratings
Search Engine John Wiley 2009 12 501 Website managers;
Optimization Bible & Sons web application
programmers
The Findability John Wiley 2009 55 173 Marketing
Formula: the Easy, & Sons professionals
Non-Technical
Approach to
Search Engine
Marketing
Mastering Web Kogan Page 2009 3 648 Marketing
2.0: Transform professionals; small
Your Business and medium sized
Using Key Website business owners
and Social Media
Tools
Marketing through Elsevier 2008 3 522 Marketing
Search professionals
Optimization: How
People Search and
How to be Found
on the Web
The ABC of SEO: Lulu Press 2005 12 11 Website managers;
Search Engine web application
Optimization programmers
Strategies
techniques and merged the specific category of mobile techniques into the category of
“Other” notes. The primary research questions did not change with any discoveries
during that process. A sample data collection sheet is listed in appendix A.
Archived Webpages
One of the challenges of studying the variable nature of websites and webpages
has been addressed by using web archives, which take a snapshot of the webpage at a
particular point in time. These archives, which are associated with a harvest date and
time, may be incomplete and render differently in web browsers that exist at the time of
analysis compared to web browsers used at the time they were created. However, the
70
HTML code leaves traces of what is missing and scars for where content should be, such
as with a missing image or broken Adobe Flash. The technologies that are used in web
archiving are not too dissimilar from search engines. A robot / spider (code) goes out and
crawls the webpages tracing links and grabbing content as it goes. Where the search
engine takes that data into a cache that is indexed and returns results, the web archive
packages the files in a mirror of its original formation and copies the files to be stored
within the archives in a format called ARC.30 The process of searching the archives is
limited by the content that has been captured well within the application. Just like print
archives, “[w]e may say that archives are the manufacturers of memory and not merely
the guardians of it” (Brown & Davis-brown, 1998). Research in web archives is limited
to what has been successfully harvested by the web archiving application.
The Wayback application has four basic components that make up an archival
web service: 1) Query UI, which allows users to search against the Resource Index in the
collections, 2) Resource Store, which stores copies of the web pages and associated files,
3) Resource Index, which allows full text search of the archived pages and other search
queries, and 4) Replay UI, which presents the content, usually with archival citation
information inserted, and inserts the Wayback code to maintain links to files harvested at
the same time (Tofel, 2007). For this project, all four components were used to find,
examine, and save documents. The Query UI was used in the Internet Archive’s
Wayback machine to query a primary URL string stored in the Resource Index, while the
30 ARC file format specification from the Internet Archive:
http://archive.org/web/researcher/ArcFileFormat.php
71
Query UI at the Library of Congress web archives cataloged websites. Therefore, I was
able to query the Resource Index for characteristics based on geography and level of U.S.
election. The Replay UI was used to save the webpage and accompanying files that are
stored in the Restore Store. In addition, I took screenshots of particular code and Replay
views in the browser of notable aspects found during the analysis.
Analysis within web archives is based on retrieval of a particular URL and
archives are grouped around a URL. Tools used to search these web archives are lacking
and “require a substantial human effort” (Costa et al., 2017). The Replay UI was used to
select webpages with content, primarily through the calendar browse interface.
Figure 3.4. Calendar browse interface of Open Wayback application displaying number
of snapshots of the webpage created by harvests.
Existence of a snapshot does not guarantee retrievable content. Each snapshot had to be
selected and often snapshots were eliminated due to URL resolution errors the harvester
encountered, such as “302: redirect” and “404: content not found.” Another group of
websites eliminated from consideration contained content that appeared to be harvested
72
into a snapshot, but resulted in a blockage in retrieving the webpage content due to a pop-
up or a log-in screen. The process by which webpages were identified that had content
and could be analyzed was a long process of trial and error. This process is extremely
time consuming, as the load time on each webpage from the archival services is
significant, taking up to 5 minutes for a partial page load from the Resource Store.
Newspaper Articles Archived Webpages
In order to identify webpages for this project related to newspapers, I made
several assumptions: 1) online newspaper articles for major dailies are created through
content management systems, 2) one or more individuals may have been involved in the
creation of the file content for an online article, 3) automated scripts from the content
management system may or may not have been used to populate aspects of the pages, 4)
due to the complex enterprise content management systems usage, articles will have the
same basic structure from the same paper around the same time period, and 5) those
content management systems are unwieldy and unlikely to be changed frequently.
For this project, I selected the Los Angeles Times for their online articles. The Los
Angeles Times has a robust online history, is archived by the Internet Archive’s Wayback
Machine and has maintained a constant URL since its initial harvesting by the Internet
Archive, latimes.com. Unlike many other national newspapers, the subscription gateways
to archiving online content were only prohibitive from 2003-2004; see Figure 3.5 for
overview of archived content available. It is also a newspaper that has both a national and
a local audience and devotes significant resources to articles on national politics. In
addition to the scope, audience, and availability of content, the Los Angeles Times was
73
selected because of its history of integrating technological advances and innovations and
for influential articles published during the time period of available archived content,
including five Pulitzer Prize winning articles in 2014 alone. (Los Angeles Times | History,
Ownership, & Facts, 2019). In selection of the Los Angeles Times, it was also important
to identify a news site where over 70% of the content was created by the news
organization. As online newspapers have attempted to stay afloat, many have integrated
mass amounts of click bait and including third-party content that may or may not be
relevant to the news content. In looking to identify the SEO and SMO practices as
integrated into the content and context of the online news articles, it was important to
eliminate the Ad and click bait concentrated publications.
Figure 3.5. Chronological graph of latimes.com website harvests on the Internet
Archive’s Wayback Machine, which spans 2000 to 2018 of publicly available content.
Due to the Los Angeles Times’ use of a large content management system, the
structure and templates for articles changed infrequently, and I was able to select an
article per year to analyze for specific structural changes. To confirm this assumption, I
did checks of two to three harvests during a particular year and skimmed for structural
changes. To identify the articles for analysis, I looked for snapshots of latimes.com over a
couple of weeks preceding or following an election. Because of the variability of the
presence of harvests, the dates of the analyzed articles are not consistent. From the
74
Table 3.2. Newspaper articles selected from Los Angeles Times on the Internet Archive.
ID Harvest Version URL Article Title
(YYYY-MM-DD-HH-
MM-SS)
la00 2001-01-07-19-45-00 http://www.latimes.com:80/ Florida Recount May Go
news/politics/decision2000/ Into Next Week
upd_election001109b.htm
la01 2001-10-08-03-32-58 http://www.latimes.com:80/ Scarce Funds Imperil Bush
news/politics/la- Health Goals
082401tommy.story?coll=la
-headlines-politics
la02 2002-02-15-21-46-20 http://www.latimes.com:80/ Heat's on Senate After
news/nationworld/nation/la- Campaign Reform Victory
021502finance.story
- 2003 N/A N/A
- 2004 N/A N/A
la05 2005-12-20-18-25-03 http://www.latimes.com:80/ Bush Names Bernanke to
business/la- Replace Greenspan as Fed
102405econ_lat,0,7536501 Chief
.story?coll=la-home-
headlines
la06 2006-10-16-11-27-58 http://www.latimes.com/ne Panel to Seek Change on
ws/nationworld/world/la-fg- Iraq
planb16oct16,0,4775251.st
ory?coll=la-home-headlines
la07 2007-11-06-02-22-46 http://www.latimes.com/ne An unsettling portrait of
ws/local/la-me- 'America's Sheriff'
carona31oct31,0,786373.st
ory?coll=la-home-local
la08 2008-10-28-13-34-35 http://www.latimes.com:80/ Popularity of mail-in voting
news/local/la-me- surges in California,
mailvote27- elsewhere
2008oct27,0,2952582.story
la09 2009-10-27-11-36-32 http://www.latimes.com:80/ Push for Afghanistan troop
news/nationworld/world/la- increase continues on
fg-obama-afghan27- deadly day
2009oct27,0,7820767.story
la10 2010-10-27-02-06-04 http://www.latimes.com/ne Conservatives struggle to
ws/nationworld/nation/la- unify for voter outreach
na-conservatives-
endgame-
20101026,0,7304435.story
la11 2011-10-27-23-15-31 http://latimesblogs.latimes.c Obama 2012 campaign
om/technology/2011/10/ob heads to Tumblr
ama-2012-campaign-starts-
a-tumblog-tumblr.html
la12 2012-10-30-12-40-03 http://www.latimes.com/ne Biden on Romney Jeeps-to-
ws/politics/la-pn-biden- China claim: 'Have they no
clinton-romney-jeep-ad- shame?'
20121029,0,6637512.story
75
Table 3.2. (continued).
ID Harvest Version URL Article Title
(YYYY-MM-DD-HH-
MM-SS)
la13 2013-10-29-02-37-29 http://www.latimes.com/wor White House OKd spying
ld/la-fg-spying-phones- on allies, U.S. intelligence
20131029,0,3235295.story officials say
la14 2014-10-21-05-43-34 http://www.latimes.com/wor Report says U.S. may OK
ld/middleeast/la-fg-iran- more centrifuges in Iran
nuclear-20141021- nuclear talks
story.html
la15 2015-10-21-13-30-57 http://www.latimes.com/loc In Humboldt County, tribe
al/california/la-me-tribal- pushes for bigger law
law-enforcement- enforcement role on its
20151020-story.html lands
la16 2016-10-15-11-26-13 http://www.latimes.com/poli Hillary Clinton keeps
tics/la-na-pol-clinton- fishing for big money while
fundraising-20161014- lagging behind with
snap-story.html smaller donors
la17 2017-11-02-02-04-16 http://www.latimes.com/poli How long can the Trump
tics/la-na-pol-immigrant- administration prevent a
abortion-20171023- 17-year-old immigrant
story.html from getting an abortion?
Case tests limit
la18 2018-10-22-21-36-43 http://www.latimes.com/nati No more 'Lyin' Ted' —
on/la-na-trump-cruz-texas- Trump heading to Houston
20181022-story.html to support Texas senator
archived latimes.com homepage on a particular date, I selected an article that was of
relevance to national politics. Part of the selection criteria was that an article must be
linked from the homepage, an indicator of importance. I was unable to collect any
newspaper articles for 2003 and 2004, as the latimes.com had all of its content behind a
paywall that the open-source harvester could not penetrate for those years. Table 3.2
outlines the 17 articles, their corresponding URLs, harvest version /snapshot date and
time, and research assigned ID used for this analysis.
Two copies of the webpage and assets were saved to a cloud folder for the
researcher. One was a straight copy from the Replay UI service, and the second was used
76
in the analysis and was edited to remove the additional Open Wayback code. A sample
data collection sheet is in appendix A.
Political Candidate (U.S. Senate) Issue Archived Webpages
In order to focus on political candidate webpages where access to the online
content for candidates may have contributed to the success of an election, I limited the
webpages to U.S. Senate elections, then reduced to those where the final margin of
victory was under 5%.31 The assumption in these close races is that they may have been
more motivated to provide increased access to the candidate webpages. The Library of
Congress web archives provides access to the “United States Election Web Archive.”
Five candidate webpages were used per election cycle, as exploration of further candidate
pages provided little to no new insight in the structure after looking at five sites. The
selection and review of five items of investigation per year also eliminated any tendency
to focus on the unique or obscure. Candidate webpage content selected for the analysis
had a topic, issue, or priority; i.e., not a biography or slogan only. Some candidate
webpages were eliminated from analysis due to the lack of available content on topics,
URL resolution errors, or pop-up blockers. For the 2012 election cycle, in order to have
five candidate websites, I analyzed Bob Casey Jr’s website from the Pennsylvania
election, where his margin of victory was 9.1% over Tom Smith. There were fewer U.S.
Senate election victories under 5% margin of victory in 2012.
31 The U.S. House of Representatives provides publicly accessible records of U.S. elections including the
House of Representatives and U.S. Senate: https://history.house.gov/Institution/Election-Statistics/.
77
Due the additional cataloging information that the Library of Congress provides
with their web archives, I was able to initially limit the content to U.S. Senate elections
and then find those identified on the close margin analysis based on the state. Because
U.S. Senate candidates may have often repeatedly run for another office, such as the U.S.
House of Representatives, the calendar browse view of the Replay UI was used to select
the correct election year. Another advantage of the cataloged resources in the Library of
Congress collection was direct relationships between sites and various URLs that
belonged to the same candidate. From the calendar view of the appropriate election year
and URL, I selected a snapshot as close to October 15th as possible. This date was
selected both because of its proximity to the national election date in November and its
appearance as a frequently available harvest date. From the main page of the candidate
website, I selected the first article available on an issue or priority for the analysis.
Because some sites prioritized issues where others used an alphabetical order, there is no
significance across websites to the first article that was available. The following table
includes 40 webpages analyzed, from elections 2002-2016, with the researcher assigned
ID, candidate name, harvest version / snapshot date and time, URL, and page title for the
article/topic.
Two copies of the topic webpage and assets were saved to a cloud folder for the
researcher. One was a straight copy from the Replay UI service, and the second was used
in the analysis and was edited to remove the additional Open Wayback code. A sample
data collection sheet is in appendix A.
78
Table 3.3. U.S. Selection of political candidate archived webpages on issues.
ID Candidate Harvest Version URL Article Title
(YYYY-MM-DD-HH-
MM-SS)
pc02a Jean 2002-10-05-10-29-11 http://www.jeancarnahan Carnahan Launches
Carnahan .com/news/releaseview.c Ads Against Social
gi?prtid=16 Security Privatization
Privatization
Schemes Force
Reductions in Social
Security’s
Guaranteed Benefit
pc02b Tim Johnson 2002-10-12-12-59-44 http://www.timjohnsonfor AGRICULTURAL
sd.com/workinghard/agri ECONOMY
culture.php
pc02c John Thune 2002-10-13-05-58-12 http://www.johnthune.co Agriculture
m/issues.asp?formmode
=issue&id=3
pc03d Mary Landrieu 2002-10-15-10-47-51 http://www.marylandrieu. Adoption
com/issues_adoption.ht
ml
pc02e Suzanne Haik 2002-10-15-10-36-44 http://www.suzieterrell.co Providing Economic
Terrell m/plan_economic.html Security
pc04a Mel Martinez 2004-10-09-16-12-05 http://www.melforsenate. Fighting for Florida
org/ Families
pc04b Betty Castor 2004-10-30-02-58-04 http://www.bettynet.com/ A Plan to Move
site/pageserver?pagena Florida's Economy
me=iss_economy Forward
pc04c Tom Coburn 2004-10-22-02-19-36 http://www.coburnforsen Dr. Coburn’s Five
ate.com/prescription.sht Point Prescription for
ml Better and More
Affordable Health
Care
pc04d Brad Carson 2004-10-10-03-43-43 http://www.bradcarson.c Growing A Strong
om/agriculture/ Oklahoma
pc04e Pete Coors 2004-10-12-01-17-06 http://petecoorsforsenate On The Issues - Jobs
.com/issues1.htm and the Economy
pc06a Jon Tester 2006-10-18-18-01-11 http://testerforsenate.co Jon Tester on the
m/issues Issues
pc06b Conrad Burns 2006-10-11-21-56-05 http://www.conradburns. Agriculture
com/issues/details.aspx?
id=
pc06c Jim Webb 2006-10-04-19-08-42 http://www.webbforsenat Iraq
e.com/issues/issues.php
#iraq
pc06d George Allen 2006-10-18-18-14-02 http://www.georgeallen.c Taxes
om/site/c.hgITL5PKJtH/b
.1528127/k.B841/Taxes.
htm
pc06e Jim Talent 2006-10-18-18-25-29 http://www.talentforsenat Agriculture
e.com/issues/default.asp
x?id=1
79
Table 3.3. (continued).
ID Candidate Harvest Version URL Article Title
(YYYY-MM-DD-HH-
MM-SS)
pc08a Mark Begich 2014-11-04-22-30-41 http://www.markbegich.c Fiscal Responsibility
om/priorities/fiscal-
responsibility/
pc08b Ted Stevens 2008-10-16-03-33-26 http://tedstevens2008.co Access to Federal
m/issues/access-to- Lands: Making
federal-lands/ Traditional Use of
Public Lands
pc08c Jeff Merkley 2008-10-15-21-00-05 http://www.jeffmerkley.c Growing Rural
om/2008/09/growing_rur Oregon
al_o.php
pc08d Gordon H 2008-10-16-01-32-07 http://www.gordonsmith. Ensuring Our
Smith com/issues/details.aspx Communities Are
?id=27 Safe
pc08e Frank 2008-10-29-21-52-16 http://www.lautenbergfor Homeland Security
Lautenberg nj.com/issues-homeland- and Combating
security-and-combating- Terrorism
terrorism.php
pc10a Michael 2010-10-15-01-18-54 http://www.bennetforcolo Building a 21st
Bennet rado.com/issues/details/ Century Economy
2010-09-building-a-21st-
century-economy
pc10b Ken Buck 2010-10-08-18-02-51 http://buckforcolorado.co Social Security
m/social-security
pc10c Pat Toomey 2010-10-14-22-27-08 http://www.toomeyforsen JOBS AND THE
ate.com/content/jobs- ECONOMY
and-economy
pc10d Joe Sestak 2010-10-14-23-25-40 http://joesestak.com/Eco ECONOMY
nomy.html
pc10e Patty Murray 2010-10-14-22-20-12 http://www.pattymurray.c Agriculture
om/issues?id=0005
pc12a Dean Heller 2012-10-17-19-46-47 http://deanheller.com/iss Growing the
ues/ Economy
pc12b Rick Berg 2012-10-17-20-47-52 http://www.bergfornorthd Jobs and the
akota.com/view/featured/ Economy
issues/jobs-and-the-
economy?ref_v=2
pc12c Richard 2012-10-03-17-48-22 http://www.carmonaforari Creating Jobs
Carmona zona.com/priorities/creati
ng-jobs
pc12d Jon Tester 2012-10-17-19-36-04 http://www.jontester.com Creating Jobs
/issues/creating-jobs/
pc12e Bob Casey Jr 2012-09-06-01-03-23 http://bobcasey.com/pen PENNSYLVANIA
nsylvania-jobs JOBS
pc14a Scott Brown 2014-10-07-23-42-54 https://www.scottbrown.c Issues
om/issues/
80
Table 3.3. (continued).
ID Candidate Harvest Version URL Article Title
(YYYY-MM-DD-HH-
MM-SS)
pc14b Mark Begich 2014-11-04-22-30-41 http://www.markbegich.c Fiscal
om/priorities/fiscal- Responsibility
responsibility/
pc14c Ed Gillespie 2014-10-15-01-27-03 http://edforsenate.com/e Replacing
g2/replacing-obamacare/ Obamacare
pc14d Dan Sullivan 2014-10-14-22-06-22 http://www.sullivan2014. Jobs & The
com/jobs_the_economy Economy
pc14e Jeanne 2014-10-14-22-29-36 http://jeanneshaheen.org Women's Rights
Shaheen /priority/womens-rights/
pc16a Maggie 2016-10-19-00-37-16 http://maggiehassan.co Combating the
Hassan m/priority/combating- Heroin &
substance-abuse/ Opioid Crisis
pc16b Kelly Ayotte 2016-10-11-23-33-26 http://www.kellyfornh.co Kelly is working to
m/media-center/get-the- make college more
facts/college- affordable
affordability/
pc16c Pat Toomey 2016-08-17-01-08-31 https://www.toomeyforse [On Iran & Isis]
nate.com/iran_isis
pc16d Katie McGinty 2016-10-11-23-17-27 http://katiemcginty.com/i Issues
ssues/#jobs
pc16e Joe Heck 2016-10-11-23-20-19 https://drjoeheck.com/on JOBS &
-the-issues/ ECONOMY:
Summary
A media archaeological method is used for this project with a historical
methodology involving document analysis. This framework is especially useful in
investigating contemporary histories. Instruction manuals and guidebooks are used as
artifacts from the time they were published with specific strategies of recommended SEO
and SMO tactics. Two sources of primary documents were used in verifying the
actualization of these strategies by using web archives and selecting articles from the Los
Angeles Times and topic pages from U.S. Senate candidates in close election races. This
81
combination of manuals and archived webpages allows for a close investigation of
specific trends and applications of SEO and SMO strategies over time.
82
CHAPTER IV
COMMUNICATION SYSTEMS FOR INFORMATION RETRIEVAL
The context for a media archaeological analysis needs to be established in an
exploration of the systems that it exists within and its relationship to prior communication
and media systems. The starting point for a media archaeological analysis should involve
a diagram of the systems and information (Parikka, 2011). This step in the analysis is
important in order to capture the complex operations in the communication system and
areas of study. The term “information retrieval” is introduced with computerized
information systems. However, the goals are consistent in earlier forms of media, where
indexing provided a universal means of “search” and finding content within a large
corpus (Krajewski, 2011). Information retrieval is the basic process in which
communication of information is exposed and made accessible. This diagraming step in
the media archaeology method is important in order to capture the complex operations in
the communication system and areas of study and identify the processes in historical
context. The following series of diagrams illustrates how search and retrieval can be
conceived through various media.
Information Retrieval in Print Mediums
Figure 4.1 is a search / information retrieval diagram, which presents the system
of indexing for information retrieval with print materials. This diagram is attempting to
abstract to the level where it could be used for a library catalog or other standard index
used in order to find information across a corpus or collection.
83
Figure 4.1. Diagram of search in a print catalog or filing system.
In this model, there are two particular points where the standard classification is
used, in order to facilitate accessing information: classification standard as used by the
intermediary classifier and the files system or complete catalog through which the user
searches. In order for the user to be able to employ the catalog, the system must be
transparent and use terminology that is understandable to the user. The intermediary
classifier as an actor in the process could be a person or an automated process, both of
which follow the rules in the standard or classification system in order to categorize and
organize documents. In this system, the creator of the document has little to no control
over how the document will be classified in this process. Once it is given to the system,
the structure of the system and its principles guide the next steps in the classification and
84
categorization. The user and information seeker are also limited by the rules and
conditions of the classification and categorization system.
The “Memex” for Information Retrieval
As new forms of media were developed to store documents, such as microfilm,
conceptual ideas of how to search vast amounts of information began to be developed. In
the essay, “As We May Think” by Vannevar Bush, the idea of a “memex” machine is
imagined. This information retrieval system is perfectly situated for the individual
researcher / scholar. The information is stored on microfilm and queried through a device
on a desk. Additionally, in this system, the researcher is able to define relationships
between documents and integrate their own notes into the information storage system.
Figure 4.2. Annotated diagram of the Memex conceptual communication and storage and
retrieval machine from “As we may think” (Bush, 1945).
85
Although never built, the Memex is considered to be an early inspirational model that
was used by early Internet designers in the creation of the World Wide Web (Houston &
Harmon, 2007). In this system of information retrieval, that addition of linkages and
relationships as an essential part of the structure introduced a new component to be
considered in retrieving relevant information.
Information Retrieval in Databases
Figure 4.3 illustrates a generic system of information retrieval in a textual
database of documents. Similar to the print information retrieval system, the primary
difference in this database model is the ability to query data through a query engine and
automated functionalities and the significantly increased capacity to query at scale
through computerized systems. The diagram is not intended to replicate a technical
diagram of a database infrastructure but rather to highlight the places of interaction
between the document, user, query and results. In this model of information retrieval,
both the document creator and user are limited to classification standard applied by a
third party (machine or person) in order to retrieve relevant documents.
The diagram is assuming that the database can store electronic copies of the
documents, in addition to the indexed data in the database; however, instead of document
delivery of an electronic document, this system could also return a locator in a
classification system, which is needed to then retrieve the document from another system.
The basic system does not change, although, the user experience may be greatly
enhanced by the capacity to deliver electronic documents.
86
Figure 4.3. Generalized diagram of text information retrieval systems and search queries.
Information Retrieval on the World Wide Web
With the World Wide Web and the creation of search engines, the system for
information retrieval evolved. The structure of the HTML document itself as machine
readable content changed the way that content could be classified and categorized. The
creator of the HTML document could put code in the document HTML section
to call out to the search engine, , or to
not index the page and give
additional directions to the search engine.32 These specific mechanisms and extra
processes are represented in Figure 4.4 through the double-arrows between the HTML
32 See: Google Search Central, “Robots meta tag…,”
https://developers.google.com/search/reference/robots_meta_tag.
87
document code and the search engine web crawler. The creator of the HTML document
can also provide information through the website directory to the search engine through a
“robots.txt” file whose the primary purpose is to specify what should not be crawled by
the search engine.33 The content creator has an additional mechanism to send information
to the search engines to crawl their content through the creation and submission of
sitemaps.34 Sitemaps are like an architectural map or guide to important content on your
website. For small or simple websites, a sitemap is often not needed. The multiple
methods that the document creator has to communicate to the web crawler / indexer are
unique to this communication and information retrieval system.
The HTML document’s data is processed by the search engine web crawler. If
SEO has been applied, then it becomes an additional way for the creator to influence how
the content is indexed within the search engine and ideally increases the chances of the
content being found by the search engine users. In this stage, many search engines also
provide tools to help document creators select keywords based on trends and user
searches, which in some ways becomes like a stand-in for the taxonomy and vocabulary
guides in prior information retrieval systems. SEO strategies applied also increase the
communication between the HTML code, page content, and the web crawler.
Once the web crawler has parsed the code on the website, the rules of the search
engine begin the gatekeeping function. Typically, search engines publish their primary
33 See: Google Search Central, “Create a robots.txt file,”
https://developers.google.com/search/docs/advanced/robots/create-robots-txt.
34 See: Google Search Central, “Build and submit a sitemap,”
https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap.
88
rules and best practices in order to encourage good behavior and also prevent Black Hat
techniques. For Google, this is where mobile friendliness, accessibility, and well
formatted HTML code are evaluated. During this phase, Black Hat techniques are also
assessed, and a webpage/site may be banned from Google or not indexed depending on
how the site’s algorithms interpret adherence to the rules. There are many reasons why a
webpage may have what seem like Black Hat techniques that are not. A notable conflict
between good communication practices and web search engine rules is the practice of
citations. If an essay on a webpage has many citations that link to other webpages, and
those webpages do not link back, which may be part of the medium within the webform
(e.g., linked webpages were published earlier and now static), then the webpage may be
flagged for link farming and not indexed. Once a webpage passes the search engine rules
gate, it is then indexed within the search engine data store. A cached version of the
webpage may also be stored in the search engine index at this time.
Proprietary algorithms process the data and determine when to retrieve content for
user searches. This is a black box of proprietary information. The process is not visible to
the document creator or the user. Page content, functionality added through SEO, linking
and relationships, and keyword searching is evaluated. In addition to the search and
retrieval algorithm, the search engine may also employ language translation, localization
data, user habits, and other filtering at this stage. Critical communication and code studies
have called for transparency surrounding this black box in order to expose the
restrictions, influence and/or change biases that the search engine employs at this stage.
89
The results from the algorithms are sent to a search engine results page (SERP) 35
where organic search content may appear alongside paid search content and knowledge
panel information. The SERP may vary depending on the hardware (device) and
software (browser) that the user employed to conduct their search. The SERP usually
contains a short description, which may be taken from a metadata tag in the or
generated from the initial page content. Once a search result is selected by a
user, the rendering of the HTML document opens in their browser. This resulting view of
the HTML document is also dependent on the user’s hardware and software.
Figure 4.4. Internet search and retrieval using a search engine.
35 Knowledge panels in Google are for people, places, organizations, and things. Open graph metadata code
must be provided, in order to support knowledge panels in Google. See:
https://support.google.com/knowledgepanel/
90
Overall, the search engine information retrieval system within the World Wide
Web has four distinguishing characteristics from prior information retrieval
communication systems: 1) the dialogue between the document creator and the indexing
service, 2) the closed algorithm box, 3) addition of evaluation relationships between
documents in the retrieval, and 4) the impact of the hardware and software of the user on
the delivery of the document.
Information Retrieval in Social Media Platforms
Through Figure 4.5, the process of on-page social media optimization for
information retrieval is illustrated. The HTML page is initially exposed through a social
media platform, which is an operation that can be either manually or programmatically
instigated. In the case of social media operation, the use of user data and targeted streams
rely on this initial user-mediated share of content through the social media platform,
although that user could be the same person as the document creator. The ability for the
document creator to affect the social media integration is enabled through the SMO on-
page techniques that add specific metadata to the of the HTML page, in order for
content to display well within the social media application’s interface.
The social media platform’s algorithms then define additional exposure through
the platform and render a display of the content. The proprietary algorithms that promote
content in the social media platform feeds are a black box and unknown to the user or
document creator. At this stage, the social media platform algorithms also review content
posts that link to external HTML pages and can evaluate for localization, user habits, fake
news, copyright violations, and content that otherwise violates the social media
91
platform’s policies. Interestingly, most of the strategies to automate or manually address
violations of content have failed in social media platforms whether that is fake news that
persists on Facebook or false-violations from copyrighted bots that remove content from
Facebook and YouTube.
The display rendering in a user’s feed will depend on the hardware and software
of the user. This pattern then repeats with additional social media shares, reactions (i.e.,
likes), and comments for the content. In some cases, the strategies of the social media
platform can also be manipulated using spam accounts and bots to increase content
exposure through the platform. Popularity and targeted results to an individual’s search
and browser history are a features of this method of retrieval.
In the model of information retrieval within social media platforms, the HTML
document creator has little ability to influence the alignment of information provided on
the HTML page with the social media feed results generated by the platform. Although
most social media platforms do have a search feature, the majority of content will be
viewed through this feed, which is more of a browse for information seeking behavior.
(This does differ for multi-media dominant sites like YouTube.) The result of HTML
content shared through social media platforms is that instead of social media optimization
structures, although they are important, the primary method for increasing content shares
is through construction of images and text that encourage interaction of users, i.e.,
clickbait.
92
Figure 4.5. HTML content found through social media platforms.
Summary
By reviewing abstract diagrams of information retrieval through communication
systems of various media, search and information retrieval on the Internet can be
contextualized within technologies and practices that came before. The overall goal
throughout these systems is to make document content accessible to a user. In the early
models of print information retrieval, the size of the corpus and time needed to search
print catalogs was a limiting factor. Across all the models, the role of politics of
information and gatekeeping is a determining factor in access to the content. In the earlier
models, the taxonomies and vocabularies were necessarily transparent to the user,
whereas, in the later models, there is some transparency with best practices and tools
published by search engines and social media platforms, but the primary function that
returns information is hidden in proprietary algorithms. Within these algorithms, the
93
criteria used to retrieve documents is expanded from the more traditional subject, author,
place, time retrieval to include relationships, format, and other characteristics defined as
important to the search engine.
As search engine and social media optimization strategies are evaluated in the
next chapters both in terms of recommendations in manuals and actualization of
strategies in newspaper article and candidate web pages, the broader context of these
communication systems for information retrieval is necessary. Chapter V will look at
search engine and social media optimization strategies over time as recommended
through instruction manuals and guidebooks, all within the communication information
retrieval systems illustrated in Figures 4.4 and 4.5 in this chapter.
94
CHAPTER V
HOW-TO GUIDES AND INSTRUCTION MANUALS FOR SEO AND SMO
Using technical instructional manuals and how-to guides in order to explore
historic communication practices is an efficient and effective method for analysis (Prior,
2003). The analysis in this chapter has two functions. First, it provides a look at the
changes in SEO and SMO recommendations over time while looking for topoi in the
context of pre-existing media and mechanisms. As the Internet becomes more important
to communication, the role of search engines and social media applications take a further
hold as the gatekeeper to information. Secondarily, it provides a basis for comparison of
the SEO and SMO strategy recommendations in the manuals to the subsequent HTML
web-page analyses of chapters VI and VII. The recommendations contained in the how-
to-guides and instructional manuals for SEO and SMO strategies include HTML code-
specific format and structure suggestions, writing, design, and content categorization.
This chapter presents an analysis of how-to guides and instruction manuals for
SEO and SMO strategies published from 2005 to 2018. Because technical manuals are
often discarded when they become obsolete or replaced by new versions, the first
available manual for the study was from 2005. Selection of the guides was limited to
availability through libraries and used bookstore platforms. The last manual examined
was published in 2018, the year this analysis began in earnest. The analysis is structured
into two sections. The first section includes analysis of the goals, audiences, and authors
of the texts, in order to set the context for the recommended strategies in the texts. The
second section reviews various SEO and SMO strategies and tactics and is broken into
subsections of structural advice, tags, tools, and composition and design of webpages and
95
analyzes the strategies for identification of topoi and media and communication practices
that transcend or evolve with webpage content guarded by the Internet’s online
gatekeeping tools which control access to information.
Goals of SEO and SMO Manuals
In examination and analysis of the SEO and SMO instruction manuals and how-
to-guides, the expertise of the authors and the audience and aims of the texts are
important to contextualize. The overall aims of the texts focused on making information
on web pages findable online and had similar characteristics, since the texts were
identified in Worldcat.org using the following subject heading terms: Internet Marketing;
Social Media; Web Search Engines; Internet Searching: Search Engines; Electronic
Information Resource Searching; and Electronic Commerce. The biggest differences in
the texts are the amount of text devoted to SEO and SMO strategies specifically within
the code and technical instructions. Some texts devoted a large space to tools and coding,
while others noted things that you might want to hire someone else to do for you.
Authors
The majority of authors’ expertise was established within the marketing and
advertising professional fields. The most common expertise touted was web marketing
and/or SEO consulting by firms with clients such as Disney, Nike, L’Oreal, and the BBC.
A few were accompanied by technical editors and co-writers or focused on web skills, in
particular. The length of experience was a continual theme in expressing expertise;
however, the form of that experience differed greatly. From one of the earlier manuals
96
(2005), one of the author’s listed qualifications was that they had been using the internet
since 1987. In a more recent manual (2018), twenty years of experience in digital
marketing is also listed among the author’s qualifications. Of course, digital marketing
had not existed for twenty years when the 2005 manual was written. The newer manuals
also tended to include speaking and conference engagements as author qualifications, as
well as industry awards. One author self-claimed to have "pioneered the video search
engine optimization phenomena" (Bradley, 2015). The expertise presented by the authors,
as a whole, was constructed in ways to demonstrate concrete impact in the industry and
helped to set up a practical and useful approach to the recommendations with the texts.
Audiences
The majority of the manuals were primarily aimed at a small business audience,
with the focus on marketing and getting products to appear in search results.
Additionally, a few of the texts noted how the strategies were also very useful for non-
profits and journalists (Kelsey, 2016), college and graduate students (Redish, 2014;
Rowles, 2018), or terminology in order to communicate with members on a team within a
larger organization who would do this work (George, 2005; Lincoln, 2009; Odden, 2012).
A few of the texts were also aimed at web designers and web system architects or those
who may be interested in starting their own SEO business (George, 2005; Ledford, 2009;
Shenoy & Prabhu, 2016). One manual, Search Engine Optimization: Your Visual
Blueprint for Effective Internet Marketing, had a complete section of the text devoted to
those who wanted to make money by selling ads through their websites and how to find a
topic and content that would entice others to then pay you to include their ads on your
97
webpage. One of the most common examples of success using this approach is within the
online mattress sales industry. This audience deserves further study in another context
where the information on the webpage is desired to have top results in search engines
specifically in order to court advertisers. The lack of more technical and code-focused
audiences for the manuals may be due to the terms used to select the manuals. However,
it is also interesting that part of the strategy of these texts was to work with audiences that
may not have been technical and still take advantage of the technical aspects of SEO and
SMO strategies for the findability of content. It’s only noteworthy that more advanced
guides were not retrieved with the search criteria in Worldcat.org or Amazon.com for the
terms “search engine optimization” and “social media optimization.”
Approaches
The manuals were divided along two primary approaches of the texts: 1) those
that focused on tools, and 2) those that focused on adapting skills from non-digital arenas,
such as communications and marketing to the digital environment. “Give Google exactly
what it wants and needs. In return, Google will give you exactly what you want...page
one domination!” (Bradley, 2015, p. xv) is an excellent exemplar of the first approach.
Whereas, other texts made it explicit that focusing on technology and tools were not
goals of the manual: “Communications and selling are the keywords. Not technology”
(Lincoln, 2010, p. xvii), and “Letting Go of the Words is about strategy and tactics, not
about tools. Technology changes too fast to be a major part of the book – and the
principles of good writing transcend the technology you use” (Redish, 2014, p.xxvi).
Interestingly, there was not a clear connection between texts that focused on tools and
98
technology with a more technical audience. The difference in goals and approach to SEO
and SMO optimization varied even with the same audience, such as small businesses.
In the area of search engine optimization, all of the texts focused on Google as the
primary search engine. Some manuals did include additional search engines, such as
Yahoo and Bing. As noted in several of the manuals, because the search engine
algorithms are proprietary and generally use similar principles, working on SEO
strategies for Google should be translated to competing search engines as well.
Interestingly, even with the focus on Google, many of the texts did not attempt to
describe the various significant algorithm changes that have affected how SEO and SMO
work within Google. The exception to this was Win the Game of Googleopoly: Unlocking
the Secret Strategy of Search Engines, which was one of the more tools and technology
focused texts and cited the changes of Googles’ Penguin, Panda, and Hummingbird
releases in particular. Although, the algorithms were not especially called out in the
majority of manuals, the recommendations shifted over time in accordance with the
algorithmic changes, particularly with the tag, as we’ll see
in the next section.
One of the major challenges of social media optimization as a whole field is the
need to select and adapt content to the various social media platforms. To address this
issue, several texts included goals on how to select the appropriate social media venue for
one’s content. Many, of course, are no longer major social media venues, such as
MySpace. In this project, on-page social media optimization in webpages is the focus
because of the ability to influence social media optimization through the HTML webpage
code itself, rather than within each different social media platform. Eight of the fifteen
99
manuals included specific advice for social media optimization. Because Twitter and
Facebook are two of the only social media applications that provide specific criteria for
on-page social media optimization, it is not surprising that the manuals focused on these
two platforms. The goals described in the texts for these sections were more akin to the
tool and technology focused texts and about crafting the format in the social media
optimization particularly for the platform.
SEO and SMO On-Page Strategies
The following section will review and analyze the on-page strategies and
recommendations for SEO and SMO with HTML pages found within the instruction
manuals and how-to guides in order to identify topoi. Critical to the recommendations
within the books is the recognition that search engines and social media applications filter
and promote web content based on webpages not websites. Because of this, each page
within an organization’s or individual’s website should include SEO and SMO strategies
particular to the content on that page. An important distinction of new media
communication on the Internet is this disaggregation of content and context. Much as
Apple’s iTunes changed the relationship to music and the listener from albums to
individual songs, the work of search and social media tools as a primary gate to web
content has had the effect of separating the pages from its website as the primary unit of
access and consumption.
In addition to the focus on webpages as distinct entities, search engines
specifically look at the relationships between links both within websites and to / from
external webpages. Those connections as coded and described within a webpage are also
100
important on-page SEO and SMO factors. “Most SEO efforts are focused on web pages.
Effective web page optimization includes a consideration of the individual page as well
as its relationship with other pages on the overall website” (Odden, 2012, p.133). This
presents an interesting challenge in the context of the webpage and the website, where
connections need to be explicit in order to establish the relationships with other pages on
the website. As on-page optimization strategies are reviewed in this section,
recommendations are included both for how a page declares and describes itself and how
the relationships with other web entities and content are described. “Google admits that
there are over 200 signals (factors) that it looks for when determining how to rank your
website” (Bradley, 2015, p. 59). Part of the work of the manual and how-to-guide authors
is to identify strategies that can make a significant difference. As a result of what has
been exposed by the corporations that own search engine and social media platforms,
there is much alignment in the strategies presented throughout the manuals.
URL Optimization
Each webpage has a URL, which is the address / locator for the page on the
Internet, and its primary component is a domain. Optimization strategies for the URL of a
webpage are consistent through the search engine and social media optimization guides in
the importance of a domain name that is named appropriately with content and ideally
matches most likely search terms (Lincoln, 2009; Odden, 2012; Redish, 2014; Rowles,
2018; Shenoy & Prabhu, 2016). A manual from 2008 stresses the importance of limiting
the usage of a sub-domain in domain strings (Michael & Salter, 2008) as interfering with
search engine optimization. However, that advice is not consistent throughout the guides,
101
but it does appear in a 2015 guide warning that subdomains lack the credibility with
search engines and social media platforms that a primary domain holds (Bradley, 2015).
Figure 5.1. Basic anatomy of a URL (Technology for Librarians 101: Anatomy of a Web
Address, 2014).
The domain acts as a foundation for web content, and value ads from content or
other structures on a webpage are credited back to the domain in search engine systems
(Bradley, 2015). Tips on domain creation and naming are fairly consistent throughout the
publication dates of the manuals examined in this project. Strategies suggested include
using short, clear, and descriptive (not technical) wording in domain names. Formatting
considerations include not using any special characters, other than the hyphen, and
always be lowercase. “Never, ever in a million years allow your URLs to have uppercase
characters” (Bradley, 2015, p. 63). In the early 2000’s the popularity of content
management systems such as Drupal and WordPress for creating and managing large
amounts of web pages became popular. These systems still exist today. The creation of
dynamic URLs for webpages, which are created on the fly or have a designation such as
“pgid=”19”” at the end of the URL, lack a human readable identity or meaning within
the URL often have scripts generating them, cause a problem for search engines and are
102
very difficult for them to read (George, 2005, p.26). Successful content management
systems integrate stable URL paths for webpages in order to overcome this problem
(Jones, 2013; Ledford, 2009).
Additional strategies include using localization with a geographic name in the
building of domain names (Bradley, 2015; Lutze, 2009; Shenoy & Prabhu, 2016). If you
are looking to advertise a physical service in an area such as Eugene, Oregon, one might
use a domain like, “https://pizza-eugene-or.com.” The advantage of this precise location
serves not only to specifically promote where one’s business is located, but will also help
eliminate out of area hits. This could be very important, for example, if the business is
receiving takeout and delivery orders from Seattle because of a similar pizza restaurant
name. Having a geolocation in the URL does not overrule the strategy to have short and
memorable URLs or ones that correspond with branding. Geolocation data can be added
in different places within the rest of the webpage. Top-level domains can also be based
on geographic regions, such as “https://pizza.uk.” Application of top-level domains may
also be chosen because they sync with a brand but are registered with ICANN as country-
level top-domains. This can increase issues with search engines, social media platforms,
and government blocking tools. For example, the website for the Open Graph Protocol
specification uses “http://ogp.me,” and “.me” is the geographic code for Montenegro. The
“.me” top-level domain became popular during the 2010’s as a way of personalizing sites
(6 Reasons Why We Like .ME Domain Names, 2013).
Although all domain names are maintained through ICANN, a non-profit, the
consumer level domain naming process goes through a registry and a reseller, often a
103
hosting platform or a company that specializes in selling domain names. See Fig. 5.2 for
the relationship between entities involved in domain name processes. Because of the
Figure 5.2. Domain registry process (Domain Name Registration Process | ICANN
WHOIS, n.d.).
limited supply of domain names using particular words, resellers can request high prices
for the domains most in demand (George, 2005; Lindenthal, 2014). Search engines,
however, have also become aware of this problem, and over time have relied more
heavily on the domain, path, and page name as a whole to use for search matches
(Rowles, 2018, p. 87). In addition, the URL has become a less important factor in search
engine ranking (Bradley, 2015). In order to use this strategy, a categorization of themes
and hierarchies should be used to construct the path of the webpages within your website
(Jones, 2013, p. 76). Simplicity wins over cleverness in the URL domain naming for
search engine and social media optimization.
104
Strategies within the HTML Page’s Header
The structure of tags allowed in HTML code set the framework for search and
social media optimization practices. Within the first part of an HTML file is the header,
, section of the HTML document. Most of the HTML tags and content are hidden
to the viewer upon rendering, except for the tag that may appear in a browser
tab or window label. Within the section of the HTML document, scripts, styles,
and other coding that enumerate the style and display preferences can be specified for a
browser and communicate with search engines and social media platforms. The basic
structure of a webpage looks like the following, as illustrated in Figure 5.3.
For the most part, tags within the are written primarily for machines. The
code in this area is not rendered visible for the human reader on a browser, except for the
tag, which may appear in the tab or window at the top of the browser.
Figure 5.3. Basic HTML structure.
Additional code in the may also include scripts to call particular functions and
features for the webpage, as well as references to style instructions to the browser and
105
device for how to render the page on the screen. The tags available in HTML,
which represent assigned metadata to the pages, include a fairly traditional set of
metadata that can be applied (i.e., author, date, language, keywords, category, abstract,
rating), as well as instructions on how to render the page on a screen (i.e., viewport).
However, only a subset are considered important for search engine and social media
optimization. Search engine optimization, in particular, uses the description and keyword
tags with varying importance over time. (The “viewport” meta tag becomes important
when mobile sites become popular.) Whereas, social media optimization uses a schema
or protocol adopted by the platform that can be referenced through the HTML code
structure in tags and is aimed at that particular platform. The content in these
tags may also duplicate content in the standard HTML tags. In addition to the use
of the tag in the browser tab or window label, it is coupled with the on the search engine results page and often in social media
posts referencing a link. Because of the reuse of the data from these two tags within the
search engine results pages, they have a particular importance in the user search engine
experience, in addition to the search engine indices, and are the primary human-readable /
user interactions with coding in the .
Title
The tag has a consistent importance over time for SEO and SMO in the
manuals examined. The earliest manual I examined points to the as a primary
area to focus on for SEO (George, 2005, p. 39), and the most recent notes “This [title] is
the most important thing on the page, as it is generally given the greatest weighting by
106
the search engines…” (Rowles, 2018, p. 85), with notes in-between similar to “one of the
most important [factors]” (Lincoln, 2009, p. 77). In line with the structural and coding
advice in construction of the titles, the strategies suggested in the manuals follow two
primary threads of advice: 1) use keywords to construct your page title, and 2) pay
attention to title length. In the use of keywords within a title, strategies include putting
the most relevant keywords to the beginning of the title, because the machines read it
first, and order is important to the algorithms (Ledford, 2009). The keywords used in the
should also not be duplicative of other keywords used on the webpage, or the
search engine will likely consider it spam (Shenoy & Prabhu, 2016, p. 83). Only a couple
of manuals specifically noted the use of a human readable title that would encourage a
user to click on the link in search results or a social media platform (Kelsey, 2016;
Odden, 2012).
The recommendations for the length of the tag, however, are directly
related to the user view within a search engine result page or social media platform. The
W3C, the body the oversees web standards, recommends 64 characters; whereas Google
allows for 66 characters and Yahoo search results allow for 120 characters (Ledford,
2009, p.131). The limit on the number of characters isn’t static, however, and another
manual published in 2015, recommends limiting to 55 characters for Google search
engine results pages (Bradley, 2015, p. 72). One of the most surprising aspects in
reviewing these manuals is that while the tag may have been listed as one of
the most important factors for SEO and SMO, the manuals paid very little attention to
strategies for title structure.
107
The lack of attention to the structure of the becomes apparent in the
following chapters, particularly with the relationship for the content of the webpage and
the title of the website as a whole. The order and stacking of webpage vs. website title
vary; for example, My web page – My web site, My
website – My web page, or My web page. The
tag may be the most obvious in looking for topoi and practices that have been
forwarded from previous media. The title does maintain an important function in
discovery and describing the content overall, as the advice from the manuals recommend.
This relationship between scaling of titles is more like the titles for articles and chapters.
However, due to the disaggregated nature of the content and separation of context, the
web authors often place the larger website title within the tag. This practice is not defined
in any of the manuals.
Metadata Description
The metadata tag for the page’s description, , is
also a highly relevant tag for SEO strategies in the manuals. However, its relevance is a
passing reference in the earlier manuals. In manuals post Google’s Panda algorithm
update (2011), the purpose of the SEO tag is stated as primarily for the user view in
Search Engine Results Page (SERP) (Bradley, 2015; Kelsey, 2016; Moran & Hunt, 2015;
Odden, 2012). Even with that primary function, it is still pointed to as the second most
important factor for SEO in a 2015 manual (Bradley, 2015). In describing the human-
readable strategies and functions for the description tag in SERPs, “The more compelling
and relevant your meta description is, the more likely it will inspire a click to the web
108
page. More clicks mean more visitors, but also serve as a signal for potential influence on
subsequent rankings. Pages that inspire more clicks may be rewarded with higher search
visibility, because users are responding positively to them.” (Odden, 2012, p.135).
Although “compelling” is hardly a structural strategy, a couple of the manuals do provide
further structural guidance.
Structural strategies for the meta description tag focus on two primary
components. The first is the length of the tag so that the text is readable on a SERP; 150-
155 characters is the limit for that function (Bradley, 2015, p.79; Ledford, 2009, p.137).
The second strategy focuses on the content of the tag. It is unclear if the content of the
description tag is used in ranking, and some propose that the content within it is treated
much as the initial text in ranking (Moran & Hunt, 2015, p. 72). The advice not
to reproduce too many words of the in the meta description serves both
machine ranking and human-readable purposes (Ledford, 2009). Because search engines
and social media platforms are focused on the uniqueness and page level strategies, it is
also important that each webpage have a unique meta description that focuses on the
content of that specific page (Bradley, 2015; Ledford, 2009). In some ways, it may seem
odd to consider uniqueness a structural strategy. However, because of the parsing that is
done by the algorithms for SEO, the placement and repetition of words becomes an
important factor.
The role of the on a SERP where the user is
deciding whether the content is worth selecting mimics the earlier bibliographic systems
and the role of the abstract, especially in article databases. Also similar to earlier
information retrieval bibliographic systems, one of the most important factors for the
109
abstract or is the length and how it fits within the
medium. Although it can function for search and retrieval purposes, its primary role is
for human-readable determinations.
Metadata Keywords
“If eyes are the windows to the soul, the keyword search is the window to your
customer’s thinking process” (Lutze, 2009, p. 29). Perhaps one of the most contested and
now obsolete SEO strategies is the use of the tag. The
purpose of metadata keywords was to anticipate the words that a user might type into a
search engine and should be a combination of topics, geographic locations, personal
names, and genre terms. Searchers typically use two to three keywords, and the webpage
should be optimized around those words (George, 2005, p. 66). The value of keywords
also depends on the term. A search term that is too general may not be helpful in searches
unless you have earned the top spot for a general term, such as “computers,” which is
nearly impossible (Lutze, 2009, p.9). And with non-organic search, Google has an entire
business around purchasing “AdWords,” which support non-organic search results at the
top of search result pages. All of the manuals except Letting Go of the Words have
significant information on tools used to generate and test keywords, such as
Wordtracker,36 Google’s Keyword Planner,37 and Free Keyword Tool.38 These tools are
suggested in SEO beyond the tag in other structural parts
36 https://www.wordtracker.com/
37 https://ads.google.com/home/tools/keyword-planner/
38 https://www.keyword.io/
110
of the page where keywords assist in search placements, such as the , headings,
and paragraph text.
Early SEO manuals suggest that, in addition to the focused keywords of the page,
common strategies to include common misspellings and errors should also be added in
the tag (George, 2005; Lutze, 2009; Michael & Salter,
2008). As algorithms increased in sophistication, automatically corrected searches
through the “Did you mean…?” function nullified the need for this kind of keyword
application. This is very important as the only place to manually enter common
misspellings and not negatively affect the position of the rest of your webpage content
was in the tag.
One of the primary problems with the metadata keyword tag was trust, as it was
one of the first SEO techniques that required major algorithm rewrites by search engines
when web page authors employed the Black Hat strategy of “keyword stuffing.”
“Keyword stuffing” is when keywords and key phrases are “overused in content merely
to attract the search engines” (Moran & Hunt, 2015, p. 459). Two main techniques have
been identified as keyword stuffing: 1) keyword loading: disproportionate number of
words and phrases for the content, and 2) keyword spam: words added that aren’t
relevant to the content on the page and may be directly targeted to attract traffic from a
competitor (George, 2005; Bradley, 2015; Rowles, 2018). Keyword stuffing can occur in
a variety of tags across HTML; however, it was most commonly found in tags in the
that were intended to be machine-readable and hidden from the user through the
browser view of webpage (George, 2005, p. 69). With overt “keyword stuffing”
prevalent, Google stopped using the tag in 2009. By 2011,
111
with the Panda release, it was officially out of the SEO game. Still, it took time for the
SEO manuals to catch up, and the meta keywords tag is listed as the “single most
important” SEO factor in a text from 2013 (Jones, 2013, p. 40). The best assessment of
the role of the tag comes from a recent manual: “These
[metadata keywords] used to be more important, but now they are less so” (Kelsey, 2016,
p. 113). Keyword usage outside of the tag remains
important for SEO and is described later in this chapter.
The tag is the closest tag functionally to traditional
indexing and classification of information. In traditional systems, this would be in a
controlled vocabulary of genre or subject terms. Even without the controlled source of
keywords and phrases, the purpose of the tag is similar in retrieving relevant information
based on anticipated user searches. In fact, the Library of Congress previously advised
that terms from the Library of Congress Subject Headings Classification39 be added to the
tag (Library of Congress, 2002). These controlled
classifications are also called “authorities” within information science and are still used in
traditional bibliographic systems for information retrieval and extracted or identified as
separate from the rest of the content of the materials. As part of this project, the subject
headings were instrumental in identifying the manuals and guidebooks. As the tag became the must untrustworthy assignment in SEO, it is
interesting that the extracted subject and other keyword definition loses its authority and
the guides recommend that keywords be embedded within the page content. It also raises
39 https://id.loc.gov/authorities/subjects.html
112
some interesting questions about subject heading assignment in more traditional systems,
what is trustworthy and what communicates the content accurately for retrieval.
Strategies within the HTML Page’s Body
As the how-to manuals and instruction guides move the bulk of their SEO and
SMO strategies, they point to the primary content of the HTML pages in the
section of the page. Many of the manuals emphasize the best strategy is to provide unique
and interesting content (Bradley, 2015; Jones, 2013; Kelsey, 2016; Lutze, 2009; Moran &
Hunt, 2015; Odden, 2012; Redish, 2014; Shenoy & Prabhu, 2016). Perhaps one of the
biggest changes in content online is that the “user is in charge of the conversation” and
that the web creates a “pull” instead of “push” form of communication (Redish, 2014, p.
151).
For example, if you did nothing other than write high quality, compelling,
relevant articles on topics related to your business and organization,
Google will find them, and more importantly, people will find them. If
they’re relevant and interesting, meaningful or helpful, more people will
share them with other people. If this happens, they will climb higher in
rankings (Kelsey, 2016, p. 5).
Audience analysis is a major component of the strategy, and a couple of the guides spend
significant time in audience analysis advice (Bradley, 2015; Moran & Hunt, 2015). Some
of the texts concentrate on searchers, as the audience, divided into categories of
navigational, transactional and informational searchers (Moran & Hunt, 2015, p. 35).
Another notes the significance of other web authors as audience and to aim for writing
content that others would want to link to or “remark on” in order to fuel the weight that
search engines and social media platforms give to interrelationships on the web and
113
popularity as authority (Redish, 2014, p. 74). Even with these distinctions, the general
consensus is to remind webpage authors that they should write content aimed at their
audience and follow best practices for communication. Write for people, not search
engines (Jones, 2013, p. 88); “too often newbies write for spiders alone” (Moran & Hunt,
2015, p. 96). How does this manifest in structural advice for creating HTML webpages
through the guides? None of the texts examined for this project were writing guides.
They provided structural advice both in terms of identifying content and how it is marked
up and structured within the HTML page that focus on a particular way of writing for the
web.
The Shape of Content
SEO and SMO instruction manuals and how-to guides are explicit that the shape
of the content should follow two basic strategies: succinct and distinctive. “Any text not
directly relevant to the content should be removed” (George, 2005, p. 27). The content
should be “bite-size” and “easy to digest” chunks (Redish, 2014, p. 149) with short
sentences (Shenoy & Prabhu, 2016, p. 83). This advice is similar to much
communications and writing advice; however, it is genre independent in the context of
webpages and SEO and SMO. The second strategy for distinctiveness revolves around
content within the webpages on a particular website. This is not about unique and
compelling content online, as much as a methodical analysis to ensure that duplicated
content is not re-used on multiple pages within a website and that each page is focused on
a single topic or focus (Bradley, 2015, p. 90; Odden, 2012, p. 133). With the webpage
114
composed of succinct and distinctive content, the instruction guides and how-to manuals
then present strategies for SEO and SMO in structuring the content within the webpage.
Hierarchical Structure Tags
HTML standards provide a set of heading tags for structural and hierarchical
arrangement of text,