## Meet Eomyidae

Meet Eomyidae, the flying Gopher.

by

Christoph Lohmann <20h@r-36.net>

## What is eomyidae?

Eomyidae is a family of extinct rodents from North America
and Eurasia related to modern day pocket gophers and kangaroo
rats. They are known from the Middle Eocene to the Late Miocene
in North America and from the Late Eocene to the Pleistocene
in Eurasia. Eomyids were generally small, but occasionally large,
and tended to be squirrel-like in form and habits. The family
includes the earliest known gliding rodent, Eomys quercyi.

Flying gophers!

## See it.

                    . .                                       
                  /`````\,                                    
                 ; O  O  o         o;:c;                     
              `:d  kkk    OO      dod                        
         odd     \  kdldddllcllldk0OO                        
          oodlcocodcoxk0KKOkkk0kxkOK0                        
              ddoocck00KXXXK0kk0OOkddx                       
                     dO0KKKXKKOkO0KKOkdodoc                  
                       llodkOO00kkkOKXXKKXXXKx               
                      ooolcclllodxdoodkkkkOk00d              
                     cxkxoollolllddc         ;xo;:           
                     :00xll     ::l;                         
                     .d00d        '::                        
                        cd          'c'                      
                        c             ;:                     
                                        c;                   
                                         ;o,                 
                                          .ol'.              
                                           .:lc,..           
                                            .',;,,,.         
                                              .',;,';.       
                                                ..',,;.      
                                                   ..''      

## How did it evolve so far?

* Over 20 iterations of different ways to crawl the gopherspace.

* There were always new problems that arose.

* When things did not scale, the algorithm had to be redone.

## How does it crawl now?

1. Use some initial URI and add it to the queue.

2. Load the old state of the queue, if there is any.

3. Sort the URIs by hostname.

4. Add URIs to the jobs based on how many selectors are there and
   how known the the host is.

5. Crawl the jobs.

6. Add the jobs and filter out, if the new selectors have been crawled
   already.

## What has been implemented so far?

* robots.txt support
	* Some details are missing.
	* User-Agent: eomyidae

* block list for manual intervention

* Effective caching of all states, so if something goes wrong, just
  restart.

* An information page at gopherproject.org.

## What is crawled now?

* Eomyidae only crawls menus.

## How do you know Eomyidae crawled you?

* If the robots.txt is requested, eomyidae is friendly and gives you a hint,
  how to control it. You should see in your logs:

	This is eomyidae, your friendly crawler. See \
	gopher://gopherproject.org/1/eomyidae for more info. \
	Have a nice day!

## How can you block Eomyidae from ever coming back?

* Put into some robots.txt, which is reachable by domain/0/robots.txt:

	User-agent: eomyidae
	Disallow: /

* Or, if you do not like any crawlers:

	User-agent: *
	Disallow: /

## Tips for robots.txt writers.

* Do not use menu item types in the pattern.

	Disallow: /1/something
	Disallow: /something

Just /something is enough.

## Where do we start in gopherspace to crawl?

Eomyidae simply uses the gopher lawn:

	gopher://bitreich.org/1/lawn

If you are in the lawn, you will be indexed. This gets people to
auto-sort or auto-curate their links.

## Statistics so far.

* Eomyidae had seen 924 gopher servers.

* There were 4523668 unique selectors crawled.

## What is the future plan?

* Be has helpful as possible.

* Do not be annoying.

* Publish data for reuse.

## Be helpful as possible.

* There is now a gopher-validator on bitreich and it is planned, to be run on
  the menus and find a simple way how to tell or report to server owners, what
  is wrong.

* First it can be used for the lawn, to check if links are still active.

* Allow some way to easily access wayback machines.

## Do not be annoying.

* Respect crawl-delay parameter in robots.txt.

* Only crawl the front of the gopherhole over and over again, to check for
  updates.

* If someone stops crawling, stop it. That is where there is the info page at
  gopherproject.org, to be able to contact me.

## Publish data for reuse.

* So far data is not published.

* There are known formats, like WebArc, but those are inefficient.
	* See http://commoncrawl.org/ for details.

* My idea is to have the raw menus in a nice sorted way.

## Crawl other file formats?

* There is the big GDPR problem with downloading all files.

* The wayback machines do it.

* I will try to see, how many useful things we get from just menus,
  if authors used goot descriptions to easily find texts or if some
  other parsing is needed.

* Eomyidae should be simple in the processing too.

* Unless we buy amazon.

## What is currently in development?

* Separation of crawling arbiters, so they are not hammering some
  servers in some situations, as it is done now.

* Recrawling based on timestamps.

* A simple publishing method for the data on gopherproject.org.

* A basic search.

* It all depends on my as hobbyist having the idea and implementing it.

* Of course you can help too.

## Eomyidae has been published today.

	git://bitreich.org/eomyidae
	gopher://bitreich.org/1/scm/eomyidae

* So far implemented in python, as a prototype.
	* That is what python is for.

* Please wait for the first release tag, for big changes.

## Goopher


      ,oGGGGGo.                                    GG
    ,GG"'   '"Go                                   GG
    GG'             ,oGGGGo.   ,oGGGGo.  GGGGGGo.  GGuGGGGo.  ,oGGGGo.  GGuGGGGG
    GG      GGGGG  oG"'  '"Go oG"'  `"Go GG   `"Go GG"'  `GG oG"'  '"Go GG"'
    GG.       ,GG  GG      GG GG      GG GG     GG GG     GG GG.oGGG""' GG
    'GGo._ _,oGG"  "Go.  ,oG" "Go.  ,oG" GG   ,oG" GG     GG "Go._ ,oo  GG
      '"GGGGG"'     '"GGGG"'   '"GGGG"'  GGGGGG"'  GG     GG  '"GGGG"'  GG
                                         GG
                                         GG


            [ ________________________________________ ] [ search ]


Thanks josuah, for creating the proposal. ;)

## Questions for the future.

* Ranking? 
	* See the discussion on gopher during the day.

* Graphs and connections?
	* See the discussion on gopher during the day.

* Regular publishing of data with time stamp.
	* For wayback machines.

## Questions?

Do you have any questions?

## Thanks

Thank you very much for listening.

Christoph Lohmann <20h@r-36.net>

Or __20h__ at #bitreich-en on Freenode.