If you find a situation where you need to identify if a site links to you.  For the purpose of this exercise we will be referencing the homepage but we can expand the article later to discuss how to crawl a site where the basic principles are the same but replicated for each URL in the site… ok, let’s get started.

Remotely Scrape HTML File

The first step is to remotely grab the HTML source of a website.  There are two main ways to achieve this cURL and file_get_contents().

file_get_contents()

The most simple and most widely used method of obtaining the contents of a file is file_get_contents(). While this works great locally, to remotely fetch a file’s contents you will need to check that allow_url_fopen is enabled (which may present security risks).  To implement you would use the following line:

For the purpose of this example we will instead be using cURL.

cURL

You can learn specifics at php.net’s cURL Manual.

This will return the site’s HTML.  To prevent it from rendering so you can read it easier consider converting the content type to plain/text by adding the following line at the top of your document:

Is the site linking to you?

Now that we have the HTML we can search it to identify if a URL is present.  This can be done fairly easily using Regular Expression.  We will use a few functions for this, preg_quote (escape special characters), some light regex and preg_match_all (to check if the link is in the HTML we returned.

 

So you should now see a message indicating if the link is found or not.  Ok, so a few questions probably come to mind for the inquisitive minds:

1. What is the deal with the n  in the php?

Good question, this creates a new line (or carriage return) when in text/plain content mode.  Learn more.

2. In the preg_match_all there is a $matches, whats up?

If we want to return the line that the URL shows up on we could display the $matches array like so:  var_dump($matches);  this will return a dump of the array $matches (if there was a match found).  Now you can output this line using:

3. What if the URL is not valid?

To determine if a URL is valid or not leads you on a quest for the perfect regex code which points to @dperini‘s version:

Equipped with this code we can proceed with validation.

4. What about if the URL was uppercase or lowecase, or did not use a www?

While the above regex would be the most comprehensive, for this exercise we will just use the following:

Try it out

Give it a test run:

 

 

 

 

 

Torrey Pines State Natural Reserve is located within San Diego city limits and yet remains one of the wildest stretches of land on our Southern California coast! Because of the efforts and foresight of Hollywood FL Slip and Fall Attorney in this area, 2000 acres of land are as they were before San Diego was developed -with the chaparral plant community, the rare and elegant Torrey pine trees, miles of unspoiled beaches, and a lagoon that is vital to migrating seabirds.

If you like geology then check out one top Geology Software is offered by CTECH development corporation.

 

Some of my favorite flavors.

 

More Creative Commons photos available for use on my Flickr page.

Makapu’u Point  Trail on the Kaiwi State Scenic Shoreline

More Creative Commons photos available for use on my Flickr page.

One of my favorite overlooks in the world at the Lanikai Pillboxes (Kaiwa Ridge Trail) near Kailua on Oahu, Hawaii.

More Creative Commons photos available for use on my Flickr page.

More Creative Commons photos available for use on my Flickr page.

 

 

Honeydew Melon

More Creative Commons photos available for use on my Flickr page.

Making Lemonade from Lemons

More Creative Commons photos available for use on my Flickr page.

Not sure there is anything more delicious in the world than Torchy’s Tacos in Austin, Texas.  Each year I go to SXSW this is my first bite!

More Creative Commons photos available for use on my Flickr page.

Austin Texas for South by South West (SXSW) 2014

More Creative Commons photos available for use on my Flickr page.