Remediate.Co
How to Crawl a Password Protected Website

Crawling Private Pages of Password Protected Websites

Last Edited March 29, 2024 by Garenne Bigby in Sitemap Generator

What is a password protected website? Why would a website be password protected? Why would I want to crawl a password protected website? These are common questions that everyone asks themselves at some point, especially if you work in website design, development, or SEO. Let’s take each question one at a time, before looking at a few others as well.

What is Password Protection?

Password protection is used by many websites as a security process in order to secure information that should not be made available to the public at large. This information is otherwise accessible from any computer, but password protection makes it unavailable to the general public and requires a password (usually with an authorized user name as well) before being accessible.

Some websites are entirely password protected, whereas others are only partially so—meaning there’s a staff or membership area.


Why use Password Protection?

It may seem confusing to come across a password-protected site, especially if the reason is not immediately clear. If you’re a website owner or manager, you may be asking the same question—“Why should I password protect my site?”

After all, password protection prevents ordinary visitors from viewing your site, which restricts traffic. The whole point of having a site up is to get as much traffic as possible, with as much visibility as you can generate, right?

Actually, there are three major reasons why you as a website manager may want—or even need—to password protect your website. If you come across one as a user, you can be sure that it’s one of the three following reasons for the website being password protected:

  • Development. The website may still be in the developmental and design phase, and therefore not ready for the public to view it. Alternatively, the site may be undergoing some major changes in terms of its design. While undergoing development or reconstruction, it’s necessary to allow employees to access the site without allowing clients to do the same. The best way to do so is to set up password protection.

  • Private Content. This reason typically results in a major portion of the website being password protected, rather than the site as a whole. The reasons for this is to encourage visitors to sign up for membership, whether it’s a free subscription or a paid account. By password protecting some of the best content, information sites (and many blogs) entice visitors to become active members. A good example would be a bank’s website: there are areas available to the general public, such as the different account types, branches, and other financial services offered by the bank. But in order to access your online banking account, you’ll need to log in, as it is password protected.

  • Security. This is somewhat of an extension of the membership cause. By setting up password protection, site owners and managers can restrict access to certain administrative functions and/or files. A good example would be a “staff only” section, or an interweb, where important information can be easily shared with staff members in a large corporation. This works better than emails, which can get lost in the spam folder or accidentally deleted, as it forms a permanent online resource library. As new employees join the company, they can immediately access all the necessary information by using their username and password. The password protected interweb system also allows staff to access the administrative back-end of the site, where they can manage client accounts on a web service site, for example.


Why Crawl Password Protected Websites?

Crawling a password protected website may sound illegal—after all, it’s password protected for a reason—but there are a number of fully legitimate reasons for wanting to crawl your protected site.

Let’s take a look at why you would want to crawl your website while it’s under password protection.

  • Developing a content inventory. This is especially important when the site is still being developed and/or redesigned, as you want to have a list of all the contents on the site. The reasons for this could easily take up another article, but the most common one would be to make sure everything that needs to be there has been uploaded.

  • Performing a content audit. Once you have your content inventory, you’ll want to go through that content and make a list of any issues that need to be fixed. These could include missing content that still needs to be uploaded, partial or corrupted files that need to be replaced, or even content that hasn’t been tagged properly to allow website crawlers and search engines to read them properly.

  • Creating a visual sitemap. Sitemaps are used by website crawlers and human users alike, and help both to navigate through your site with ease. By crawling through your password protected site before launching, you can draw up your visual sitemap in advance and immediately see where information needs to be better organized. This also helps when you’re doing a major redesign, as you can plan your project ahead of time, make comparisons to the previous design, and implement necessary changes in real time before launching.

  • SEO. The SEO related reasons for wanting to crawl any website are extensive enough to warrant an article dedicated to the subject, and all of them apply to crawling a password protected website too. In brief, crawling your site under password protection allows you to see what the search engines see. As you’re going to be relying rather heavily on them to help attract traffic to your site, you want to make sure that your site isn’t only well optimized for human users, but for search engines as well. In part, this includes your usability experience, the effectiveness of your sitemap, and the meta tags allowing search engines to crawl your site content thoroughly for inclusion in their search engine indexes (which they use to generate search query results so fast). Making sure that your site is well optimized before launching can and will save you a lot of difficulties, effort, and frustration in the long run.

  • Accessibility. We’ve already covered optimizing the accessibility of your site for search engine website crawlers and general users, but there’s another important aspect of your site’s accessibility score that you certainly don’t want to overlook. Whether your website is going to become available to the general public once you launch or a section will remain password protected, you want to make sure that the pages are easily accessible to persons with disabilities. This is especially true of private website pages that will continue to be password protected.


Best Practices for Crawling Password Protected Websites

Before we get into the details of how to crawl a password protected website, there are some practices that you want to bear in mind and adhere to.

  • Use a read-only account when crawlingYou should never use an administrative account when crawling your website because you will give the crawler full access to the admin functions the user account has permission to control. With full administrative access, you risk having your crawler make changes to the site’s themes and plugins, even deleting posts. This is why it is imperative that you set up a read-only account for crawling your website.

  • Always exclude your admin pagesThis follows from the above point: the read-only account you’ve set up for crawling your website must exclude the administrative back-end pages and folders to help you avoid making undesirable changes to your website. For WordPress, you’ll want to exclude the entire /wp-admin/ section; for Joomla, the same applies to the /administrator/ section.


Let us build it for you

Every Dyno Mapper subscription comes with authentication support.

Submit a support ticket and include the following information.

  • Login URL
  • Temporary Login details (read-only account)

We will supply you with an import code within 24 to 48 hours (Business hours Monday through Friday (9 am - 5 pm Eastern Time Zone).

After you have received your import code, add it to Dyno Mapper.

  1. Click Create Project

    001 Auth Create Project


  2. Click CREATE under the Create from URL

    002 Auth Create from URL


  3. Click CLOSE WIZARD - you'll need to close the wizard so that you can edit the authentication settings

    003 Auth Close Wizard


  4. Open the Authentication Options section, and Click the Advanced Custom System Login icon

    004 Auth Add Plugin


  5. Click IMPORT and add the code that we supply you with and click IMPORT

    005 Auth Import

    006 Auth Import Code


That's it; you can TEST your Custom System Login with your login credentials. After confirming that your authentication is successful, you can crawl your site. Repeat steps 1-3; add your user credentials, and start crawling.

  1. Click Create Project
  2. Click CREATE under the Create from URL
  3. Click CLOSE WIZARD
  4. Add the LOGIN URL

    007 Auth Login URL

  5. Open the Authentication Options section, in the System dropdown, select the new Custom System Login, and add your Login Credentials
  6. START CRAWLING


Can I build it myself? Yes

What Do I Need to Know?

1. Learn about CSS selectors and HTML

When building a Custom System Login, you’re going to need some basic knowledge of both CSS and HTML, which are the programming protocols used. We’ll give you a breakdown of the four best sites for learning CSS and HTML to help you get started.

Theoretically, you could ask your developers to set this up for you. But you want to have full control over your Custom System Login, so for security reasons, it’s always going to be better for you to do it yourself. But don’t think you’ll have to enroll yourself in a college or university course to learn how!

Here are the four best online tutorials and courses, which are available 100% free of charge.

  • Udacity - Intro to HTML and CSS. Udacity have some fantastic online courses made freely available, and their Intro to HTML and CSS is certainly one of the best.

    The tagline for this course is “Not Your Typical Intro,” and we wholeheartedly agree. While most HTML and CSS tutorials focus on the language syntax, Udacity recognizes that syntax alone won’t help you design a webpage—or, in our case, setup a Custom System Login. This course is structured in such a way as to help you think like a front-end developer, showing you where to start and how the task should be approached.

    When you complete the course, you’ll have all the knowledge you need to implement HTML and CSS in all of your page elements too—meaning you’ll be able to make minor changes to your site without having to call in a developer every time.

  • Khan Academy - Intro to HTML/CSS: Making Webpages. Khan Academy has an extensive list of topics that are wonderfully broken down into a variety of modules. You won’t have to complete the full Computer Programming subject to partake in their Intro to HTML/CSS: Making Webpages course.

    This is another fantastic, extensive exploration of HTML and CSS, and teaches you how to use both for developing web pages. The same principles will apply when you setup your Custom System Login, and understanding the difference between HTML—a markup language—and CSS—a stylesheet language—can make or break your ability to do so efficiently.

  • Codecademy - Learn HTML and Learn CSS. Codecademy focuses purely on the teaching users the different coding languages and applications. Both the Learn HTML and Learn CSS tutorials are brilliant crash courses designed to help novices learn the ropes quickly and efficiently.

    While others have an estimated duration of 3 weeks, you’ll learn everything you need to know to setup your Custom System Login in as little as 12 hours with these two tutorial courses. As part of the course, you’ll also be given an opportunity to build a fashion blog and a wine festival schedule (HTML), after which you can tackle the CSS tutorials, where you’ll complete a few extra projects. These projects aren’t necessary for the immediate need of crawling your password protected site, but having this knowledge will prove invaluable when you’re looking for any bugs that need to be fixed or doing minor changes later on without having to hire a developer each time.

  • W3Schools - HTML5 Tutorial and CSS Tutorial. W3Schools is a popular source for learning website development tools. They rightly market themselves as the world’s largest tutorial and reference site for the industry.

    You’ll want to start off with their extensive HTML5 Tutorial, which you can easily finish in a day and includes some basic CSS implementation already. There are hundreds of examples to help make the learning process that much easier, and W3 Schools have also included on-page HTML editors that allow you to practice each lesson and view the results in real time.

    To properly supplement your CSS knowledge—whether you plan on using it further, or only want to be able to set up your Custom System Login—you should also run through their CSS Tutorial. Once again, there are hundreds of examples provided to help you learn quickly, and on-page editing sections where you can practice and view your coding skills immediately.


2. Learn how to use a browser inspector tool

This knowledge will prove invaluable in finding the right information in the code that will be necessary for building your custom system login. You’re going to need a browser inspector tool for finding the necessary information in your code to help you build your Custom System Login.

We’ve found the six best browser inspector tools, specifically designed and developed for the most popular browsers in use.

  • Chrome DevTools. Possibly the most popular browser in use, Google Chrome features a range of tools to help webmasters get the most out of their experience. Chrome DevTools are built into Google Chrome, and the authoring, debugging, and profiling tools that make up the set are easily accessible. To do so, simply navigate to Chrome’s Main Menu and select “More Tools” > “Developers Tools” to activate the package. Right click on any page element and choose “Inspect”, and you’ll receive a full breakdown.

  • Firebug. Firebug is a long-standing favorite for Firefox users, which closely rival Chrome users in terms of numbers. While the Firebug extension itself is no longer being further developed or maintained, it remains one of the most powerful web development tool sets available online. Firebug can still be comfortably and efficiently used at present, but you’ll certainly want to follow Mozilla’s recommendation of migrating to the next tool on our list—Firefox Developer Edition.

  • Firefox Developer Edition. Replacing Firebug as the go-to option for developers and website managers using Mozilla Firefox, the Firefox Developer Edition comes with an extensive set of tools. For the express purpose of using the browser inspection tool in developing your Custom System Login, you should take a look at the Page Inspector, their number 1 core tool.

  • Safari Web Inspector. For those of you using a Mac with the default browser engine, Safari Web Inspector is the solution for you. Apple has outdone themselves in creating a command center that affords their users with quick and easy access to one of the richest sets of development tools available. Safari Web Inspector is fully compatible with macOS, iOS, and tvOS.

  • Microsoft Edge Developer Tools. Not many people still use the default Windows browser anymore, but for those of you who do, there’s the Microsoft Edge Developer Tools. The F12 developer tools range has been recently updated to include features requested by users. For anyone using Microsoft Edge, the seven tools included in the F12 range are well-developed and are sure to meet your needs.

  • Opera Dragonfly. Last but not least is the Opera Dragonfly developers’ tool set. While most Android users will be familiar with Opera as the default browser for many smartphones, the engine is also very adept when used on your desktop or laptop too. Recognizing the need for Opera-oriented developer tools, the team have released Dragonfly, a fully-featured suite that’s built into the browser itself. To make use of the Dragonfly feature, right-click on any element and select “inspect”—as easy as that!


3. Build a Custom System Login

Create a DYNO Mapper account if you do not already have one. Tiered pricing is available based on the page count of your project. After you have logged into DYNO Mapper, follow these instructions.

  1. Click Create Project
  2. Click CREATE under the Create from URL
  3. Click CLOSE WIZARD - you'll need to close the wizard so that you can edit the authentication settings
  4. Open the Authentication Options section, and Click the Advanced Custom System Login icon
  5. Click CREATE
  6. Add a Title for your Custom Login
  7. Add Each Neccessary Action and SAVE & EXIT

    008 Auth Actions

 

Garenne Bigby
Author: Garenne BigbyWebsite: http://garennebigby.com
Founder of DYNO Mapper and Former Advisory Committee Representative at the W3C.

Back
Remediate.Co

Related Articles

Create Visual Sitemaps

Create, edit, customize, and share visual sitemaps integrated with Google Analytics for easy discovery, planning, and collaboration.

Remediate.Co

Popular Tags

Search Engine Optimization SEO Accessibility Testing Create Sitemaps Sitemaps UX User Experience Sitemap Generator Content Audit Visual Sitemap Generator
Create Interactive Visual Sitemaps

Discovery has never been easier.

Sign up today!