What's in a name? Clever usernames are all over the internet. Twitter features a lot of usernames that involves clever puns, plays on words, or portmanteaus. Understanding the joke requires knowledge and context. If you know some of the background of where a name is presented, what it’s attached to, and what site you are on you’ll be able to pull apart the various references and neologisms and get the humor behind it. Humans can do it pretty well. But can a computer? I don’t think it can! But I’m curious to see if it can get close sometimes, or if it’s wrong in an amusing way.

This project involves username ideas posted by users on the Something Awful forums. To get the usernames we’ll write a function to scrape the posts and put them into a SQLite database. We’ll create an admin screen to categorize, tag, and sort the names. We’ll present the names in a front-end interface that allows users to browse the names, add tags, give them thumbs up or down, and add their own explanations of the meaning behind the name. The front end will also offer automatically searched and/or generated avatar suggestions as well as eventually presenting the computer’s analysis of the meaning behind the name. The front end will be written in React/TypeScript and use React Testing Library for testing. For hosting images we’ll probably end up using AWS S3 or an equivalent.

I’m working on this project with Brenden Brown, who is a good friend of mine and an experienced database engineer. He’s providing useful consulting advice and expertise while I’m doing the majority of the coding.

First Conversation- Requirements and Architecture

We talked about how the project was going to go. I decided to do more of a proper software project rather than a hack-week quick thing. Therefore I’ll do some thinking about the architecture of the database and the requirements of the front end before I sit down and begin coding.

Database

  • One database is plenty
  • Sqlite is the easiest one to do
  • Every DB table will include
    • Soft delete
      • Flag things as ‘deleted’ or ‘inactive’ rather than removing entries from the db
    • Created By
    • Updated by
    • Updated time

That way if you do something and everything gets overwritten you can tell which run of the code did it.

Structure of the program

One concept is to keep every piece somewhat independent

  • Run scrape
  • Get output
  • Either use output to do the rest of the steps
  • Or re-run it
  • Have multiple different scraping jobs

Scraping function

The input should be the thread that it’s gonna scrape. The SA Forums are an ancient PHP based bulletin board system. Individual posts have a post id, and each post is associated with a thread. Each thread on the SA forums has a threadId, which is the unique number that identifies that thread. Threads also have names. But the thread names on the forum can change.

The threads

GBS (General Bullshit- the general purpose forum): ITT new user names CSPAM (Politics shitposting forum): username ideas YOSPOS (Computer toucher forum): The Techbronomicon: cyberpunk names

So the input for the scraping function will be a table of threadids.

  • maybe active/inactive flag
  • Store Watermark
    • last page and post scraped
  • Thread name
  • Concepts associated with all posts in the thread
    • used for analysis
  • CSPAM politics
  • YOSPOS technology

Scraping Output

Make a table for each run. So that you know kind of “this run id, what table was scraped, how many records did it find, etc.”

Output table

  • runId
  • Post
    • Postid
    • Timestamp
    • Threadid
    • Username
    • Postbody- whole raw text

Processing Output

What we’re looking at is mainly the post body. The post body isn’t just a single name, or a list of names.

  • Post body may contain multiple names
  • Post body may also contain quotes
  • Quotes may indicate good names
    • Track the # of quotes
    • When a post is amusing, users will quote it
    • usually an “emptyquote”, or quote without further text indicates approval
    • So if we spot that a name has been repeatedly emptyquoted we should flag it as good
  • Id each ‘name’ in a post
  • Names table
  • Name
  • postId
    • The postId is a unique number that points at a single post on the forums
    • You can link to a single post without the corresponding threadId
    • You can link to a post within the thread by generating a properly formatted link containing the threadId and postId
  • A Comment field - which Id in the scrape table

Frontend features/requirements

Now we have a list of potential usernames that we pulled from the names table.

Basics

  • Run an image search on the name
    • Maybe resize or crop the image
      • Maybe selectable size
      • This is a separate scraping job
      • Input is a user name table of some kind
      • It draws from that, then downloads the images
      • Cropping and resizing can be done offline

Now we have names and images

  • We can assume the images have been cropped
  • Store them on disc and reassess them to display
  • We could put them in an s3 bucket if we want them somewhere online

Admin/Human interaction

There’s gonna be human oversight of this.

The next stages are really the back-end and front-end for the service.

The Dashboard

Front end MVP features

  • You’d display the name
  • Display the images
  • Next button

Front end additional features

  • Back button
  • List of names that you’ve seen
  • A search
  • Show me more names by this user
  • Most quoted names/best names
  • A way to favorite a name and return to it
  • Thumbs up or thumbs down

Front end fancy features

  • Recommendation service
    • If you liked this name, check out these others
  • Long form explanation field
    • Tagged as official
    • Most upvoted user submission
  • Most viewed

Use the requirements of the website Decide What APIs to call Then implement the backend to call the API

Analysis

User Analysis

  • Did you like it
  • Is it a portmanteau
  • What tags would you put on it?
    • Official tags
    • User defined tags
      • May be adopted as official tags thru the admin panel
  • Long form description
    • upvote/downvote system
    • Maybe look through the longform description for keywords

Admin

Maybe a separate admin dashboard that lets you see the list of candidate names.

  • CRUD
    • Create, Read, Update, Delete
  • Detect if it was incorrectly parsed, or parsed correctly but bad, or flag as a good/clever name.
  • Soft delete feature- flag things as deleted so they don’t get displayed
  • An ‘active’ or ‘is deleted’ column
  • Never delete stuff!
  • Split into names
  • Add a manual comment or description
  • Human Review of images

Computer Analysis

Analyzing the Names Is this a real word? If not, does it have sub-words that are real words? Parts of real words? What does the program think was substituted here? Is this a real person? Is it a place?

Portmanteau

Associations with the word

  • Does it have to do with
    • politics
    • music
    • computers/tech
    • celebrity
    • pop culture
    • or whatever

Development First Steps

One step:

  • List the processes
    • List the requirements for them
  • Separately: Data schema
    • Where all tables are listed out
    • Columns
    • Connections
  • Processes: which tables they read from and write to

Then start to work on them somewhat independently.

Get it on github.

Toss an .md into the repo.

1st: List of tables with their schemas would be a good objective.

Turn that into a DDL file- a sql create table statements.

Then you have one file that is the source of truth to stand up a new database.

Conclusion

Ok, so next up is my first past at the DB schema.