What's in a name? Clever usernames are all over the internet. Twitter features a lot of usernames that involves clever puns, plays on words, or portmanteaus. Understanding the joke requires knowledge and context. If you know some of the background of where a name is presented, what it’s attached to, and what site you are on you’ll be able to pull apart the various references and neologisms and get the humor behind it. Humans can do it pretty well. But can a computer? I don’t think it can! But I’m curious to see if it can get close sometimes, or if it’s wrong in an amusing way.

First Conversation- Requirements and Architecture
Database
Structure of the program
Scraping function
The threads
Scraping Output
Front end MVP features
Front end additional features
Front end fancy features
- Analysis
User Analysis
Admin
Computer Analysis
- Development First Steps
- Conclusion

This project involves username ideas posted by users on the Something Awful forums. To get the usernames we’ll write a function to scrape the posts and put them into a SQLite database. We’ll create an admin screen to categorize, tag, and sort the names. We’ll present the names in a front-end interface that allows users to browse the names, add tags, give them thumbs up or down, and add their own explanations of the meaning behind the name. The front end will also offer automatically searched and/or generated avatar suggestions as well as eventually presenting the computer’s analysis of the meaning behind the name. The front end will be written in React/TypeScript and use React Testing Library for testing. For hosting images we’ll probably end up using AWS S3 or an equivalent.

I’m working on this project with Brenden Brown, who is a good friend of mine and an experienced database engineer. He’s providing useful consulting advice and expertise while I’m doing the majority of the coding.

First Conversation- Requirements and Architecture

We talked about how the project was going to go. I decided to do more of a proper software project rather than a hack-week quick thing. Therefore I’ll do some thinking about the architecture of the database and the requirements of the front end before I sit down and begin coding.

Database

One database is plenty
Sqlite is the easiest one to do
Every DB table will include
- Soft delete
  - Flag things as ‘deleted’ or ‘inactive’ rather than removing entries from the db
- Created By
- Updated by
- Updated time

That way if you do something and everything gets overwritten you can tell which run of the code did it.

Structure of the program

One concept is to keep every piece somewhat independent

Run scrape
Get output
Either use output to do the rest of the steps
Or re-run it
Have multiple different scraping jobs

Scraping function

The input should be the thread that it’s gonna scrape. The SA Forums are an ancient PHP based bulletin board system. Individual posts have a post id, and each post is associated with a thread. Each thread on the SA forums has a threadId, which is the unique number that identifies that thread. Threads also have names. But the thread names on the forum can change.

The threads

GBS (General Bullshit- the general purpose forum): ITT new user names CSPAM (Politics shitposting forum): username ideas YOSPOS (Computer toucher forum): The Techbronomicon: cyberpunk names

So the input for the scraping function will be a table of threadids.

maybe active/inactive flag
Store Watermark
- last page and post scraped
Thread name
Concepts associated with all posts in the thread
- used for analysis
CSPAM politics
YOSPOS technology

Scraping Output

Make a table for each run. So that you know kind of “this run id, what table was scraped, how many records did it find, etc.”

Output table

runId
Post
- Postid
- Timestamp
- Threadid
- Username
- Postbody- whole raw text

Processing Output

What we’re looking at is mainly the post body. The post body isn’t just a single name, or a list of names.

Post body may contain multiple names
Post body may also contain quotes
Quotes may indicate good names
- Track the # of quotes
- When a post is amusing, users will quote it
- usually an “emptyquote”, or quote without further text indicates approval
- So if we spot that a name has been repeatedly emptyquoted we should flag it as good
Id each ‘name’ in a post
Names table
Name
postId
- The postId is a unique number that points at a single post on the forums
- You can link to a single post without the corresponding threadId
- You can link to a post within the thread by generating a properly formatted link containing the threadId and postId
A Comment field - which Id in the scrape table

Frontend features/requirements

Now we have a list of potential usernames that we pulled from the names table.

Basics

Run an image search on the name
- Maybe resize or crop the image
  - Maybe selectable size
  - This is a separate scraping job
  - Input is a user name table of some kind
  - It draws from that, then downloads the images
  - Cropping and resizing can be done offline

Now we have names and images

We can assume the images have been cropped
Store them on disc and reassess them to display
We could put them in an s3 bucket if we want them somewhere online

Admin/Human interaction

There’s gonna be human oversight of this.

The next stages are really the back-end and front-end for the service.

The Dashboard

Front end MVP features

You’d display the name
Display the images
Next button

Front end additional features

Back button
List of names that you’ve seen
A search
Show me more names by this user
Most quoted names/best names
A way to favorite a name and return to it
Thumbs up or thumbs down

Front end fancy features

Recommendation service
- If you liked this name, check out these others
Long form explanation field
- Tagged as official
- Most upvoted user submission
Most viewed

Use the requirements of the website Decide What APIs to call Then implement the backend to call the API

Analysis

User Analysis

Did you like it
Is it a portmanteau
What tags would you put on it?
- Official tags
- User defined tags
  - May be adopted as official tags thru the admin panel
Long form description
- upvote/downvote system
- Maybe look through the longform description for keywords

Admin

Maybe a separate admin dashboard that lets you see the list of candidate names.

CRUD
- Create, Read, Update, Delete
Detect if it was incorrectly parsed, or parsed correctly but bad, or flag as a good/clever name.
Soft delete feature- flag things as deleted so they don’t get displayed
An ‘active’ or ‘is deleted’ column
Never delete stuff!
Split into names
Add a manual comment or description
Human Review of images

Computer Analysis

Analyzing the Names Is this a real word? If not, does it have sub-words that are real words? Parts of real words? What does the program think was substituted here? Is this a real person? Is it a place?

Portmanteau

Do we think this name is a portmanteau?
- why?
- Is spotting portmanteaus available as an API?
- Is this a solved problem?
- Here’s an article on identifying novel portmanteaus

Associations with the word

Does it have to do with
- politics
- music
- computers/tech
- celebrity
- pop culture
- or whatever

Development First Steps

One step:

List the processes
- List the requirements for them
Separately: Data schema
- Where all tables are listed out
- Columns
- Connections
Processes: which tables they read from and write to

Then start to work on them somewhat independently.

Get it on github.

Toss an .md into the repo.

1st: List of tables with their schemas would be a good objective.

Turn that into a DDL file- a sql create table statements.

Then you have one file that is the source of truth to stand up a new database.

Conclusion

Ok, so next up is my first past at the DB schema.