RDBMS Insight

Subscribe to RDBMS Insight feed RDBMS Insight
natalka's oracular insights
Updated: 6 hours 36 min ago

Why book reports aren’t a good homework assignment any more

Fri, 2023-11-24 13:40
/* Chat containers */ .chat { border: 2px solid #dedede; background-color: #f1f1f1; border-radius: 5px; padding: 10px; margin: 10px 0; } /* Darker chat container */ .darker { border-color: #ccc; background-color: #ddd; } /* Style images */ .chat img { float: left; max-width: 60px; width: 100%; margin-right: 20px; border-radius: 50%; }

So, everyone knows that the kids these days are using ChatGPT to write their book reports. Even Big Nate knows it!

But what about ChatGPT’s safeguards? Isn’t it supposed to have some kind of anti-cheating baked in, so it won’t just write essays for kids? Why doesn’t that work?

Sure, it does have safeguards… kind of. If you just ask it to write an essay, it responds with a “helpful” answer about how to write an essay. The thing is that these safeguards are incredibly easy to work around.

Let’s pretend we’re a student who has to write a book report on The Kingdom over the Sea, by Zohra Nabi. Here’s how to write it in 20 minutes without even touching the book.

1. Find a review.

Look up the book online in a search engine. Find a book review that describes it and skim the review. Note the main character’s name and what happens.

In this case, the Kingdom over the Sea was reviewed on Kirkus Reviews. The main character’s name is Yara, and the author’s name is Zohra Nabi.

2. Ask ChatGPT for the first paragraph.

Let’s pretend we’re a Grade 6 student asking ChatGPT what should go in the first paragraph of this book report. Here are the prompts I used:

Big Nate

Hi Im a Grade 6 student and I need to write a baper on a book called, the Kingdom over the Sea.

Big Nate

can you tell me what to put in the first paragraph

chatgpt logo

Of course! The first paragraph of your paper is usually the introduction, and it should give your reader a brief overview of the book and what your paper will be about. Here’s a basic structure for the first paragraph:

Start with a hook: Begin with an interesting or engaging sentence to grab your reader’s attention. It could be a quote from the book, a surprising fact, or a question related to the story.

Introduce the book: Mention the title of the book (“The Kingdom over the Sea”) and the author’s name if you know it.

Provide some context: Briefly describe the genre of the book (e.g., fantasy, adventure) and its setting (where and when the story takes place).

Mention the main idea: Give a hint about the main theme or what the story is about. For example, is it about a young hero going on an epic adventure, a mystery to be solved, or a journey to a magical land?

State your purpose: Finally, state that your paper will explore and analyze various aspects of the book. Mention the topics or themes you plan to discuss, such as characters, plot, or themes.

Here’s an example to get you started:

“In the enchanting world of ‘The Kingdom over the Sea,’ written by [Author’s Name], readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of [Main Character’s Name] as he embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”

Feel free to customize it to fit your own thoughts and ideas. If you have any specific details about the book you’d like to include in the introduction, please let me know!

3. Fill in and rephrase

Ignore most of ChatGPT’s long, didactic answer. Grab the sample paragraph, fill in the missing “Author’s Name” and “Main Character’s Name”, and–this part is key–ask ChatGPT to rephrase it like a Grade 6 student wrote it. If you don’t, you’ll wind up with this problem:

Big Nate comic

Here’s the rewritten prompt:

Big Nate

please rewrite this text so it sounds like a Grade 6 student wrote it: “In the enchanting world of ‘The Kingdom over the Sea,’ written by Zohra Nabi, readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of Yara as she embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”

chatgpt logo

“In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.”

And there’s the first paragraph of your book report — done!

5. Second paragraph: Paraphrase

For the second paragraph, just ask ChatGPT to paraphrase the whole review you found in step 1. Paste the entire review into the prompt. (I haven’t pasted the whole thing; the review starts with “Cryptic directions”)

Big Nate

can you write this paragraph like a 6th grader wrote it “Cryptic directions…

chatgpt logo

“Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.”

6. Expand

Now you have two paragraphs. Feed both of them to ChatGPT and ask it to write Paragraph 3.

Big Nate Paragraph 1 of the book review is is “In ‘The Kingdom over the Sea’…. Paragraph 2 is “Yara’s mom…” What should I put in Paragraph 3?

Note: In the actual prompt, I pasted in the entire first and second paragraphs; I didn’t duplicate them here because that would be boring. But you get the idea.

chatgpt logo

“One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!”

7. Put it all together

Put the three paragraphs all together to get a finished 440-word book review that could plausibly come from an overly enthusiastic middle-schooler:

In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.

Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.

One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!

8. Sprinkle with mistakes and hand in

Add a few spelling mistakes for verisimilitude, and the homework is done.

This took me about 20 minutes, start to finish: much less time than it would’ve taken me to read the book, think about it, come up with something to say, and write it down.

And if I really was a Grade 6 student, I would’ve lost the experience of reading a fantastic book, thinking about it, and reflecting on my own reaction to it. (I did read the book, and it really is lovely.)

What’s a teacher to do?

I don’t know, but here are a couple of ideas from a non-teacher:

  • Use in-class quizzes, not book reports to see who’s done the assigned reading.
  • Ask for response papers that discuss a student’s personal reaction to a book, rather than summarizing the book.
  • Ask for personalized answers in which students contextualize their response to a book with something in their own lives.
  • Require handwritten homework so that, at the very least, students have to hand-copy what ChatGPT spits out. Even the act of copying can lead to reflection.
Post credits
  • Post images are from Big Nate, written and drawn by Lincoln Peirce.

  • Chat CSS is adapted from W3Schools.
Categories: DBA Blogs

I started an AI newsletter just in time for OpenAI to implode

Mon, 2023-11-20 17:53

Last week, I decided to soft-launch a weekly newsletter to highlight three or four recent news stories in AI. When I started it, I had no idea this would be one of the biggest weeks in AI news all year. There was one giant story: OpenAI, makers of ChatGPT, fired their CEO, Sam Altman. And then it got messy.

You can read the rest of this week’s newsletter here.

Why a newsletter: I spend several hours a week reading about AI/machine learning. I get frustrated when I see overly-simple takes on AI in the media, on social media, etc. I figured that the best thing I could do about that would be to share what I’m learning every week.

What the newsletter is: A weekly email about some of the past week’s most interesting stories in AI, plus the occasional backgrounder.

What it’s not: Business-focused, highly technical, or comprehensive.

What you’ll get from AI Week:

  • An interesting email about AI in your inbox every week.
  • Background knowledge to contextualize all the breathless media stories about the powers, or dangers, of AI.
  • A feeling for where the field is going, why it’s going there, and who’s powering it.

Where do I sign up? Right here:


Enter your email

Powered by Buttondown.

Categories: DBA Blogs

Blepbot update: Hashtags

Wed, 2023-10-25 12:02

I did follow up on on of the possible improvements to the BlepBot in my last post. Mastodon has some fun day-specific tags for animal pics. I decided to use #TongueOutTuesday and #FurballFriday, since almost all the bleps are furry creatures.

# if it's Tuesday, add TongueOutTuesday hashtag
todays_date=datetime.datetime.now()
if todays_date.strftime("%w")=='2':
    post_text+=" #TongueOutTuesday"
elif todays_date.strftime("%w")=='5':
    post_text+=" #FurballFriday"

I also wanted to use a couple of cat-specific ones, but only on pictures of cats! To do that, I use re.search to regex search the post text for “cat” (case-insensitively).

elif todays_date.strftime("%w")=='3':
    if re.search("cat",post_text,re.IGNORECASE):
        post_text+=" #WhiskersWednesday"
elif todays_date.strftime("%w")=='6':
    if re.search("cat",post_text,re.IGNORECASE):
        post_text+=" #Caturday"

Of course, we need to import datetime and re to use these:

from mastodon import Mastodon
import os,random,logging,datetime, re

The blepbot code is in github as a public template: https://github.com/nroshak/blepbot. Feel free to clone it and make it your own!

Categories: DBA Blogs

Creating a Mastodon bot in Python

Thu, 2023-06-29 21:49

Bleps always cheer me up, so I decided to create a Mastodon bot that posted one blep a day. (A blep is a picture of an animal with a tiny bit of its tongue sticking out, and Mastodon is a decentralized alternative to Twitter.)

Creating a Mastodon bot in Python was surprisingly easy, thanks to the Mastodon.py library. I’ve documented all the steps here, along with links to tutorials for each library or builtin feature that I used.

Have you created any Mastodon or Twitter bots? Let me know in the comments!

(Featured image: “blep” on dictionary.com)

I started by following Terence Eden’s guide (https://shkspr.mobi/blog/2018/08/easy-guide-to-building-mastodon-bots/) to create a basic script that would post one picture to Mastodon.

from mastodon import Mastodon
 
# base directory
basedir="/home/nroshak/blepbot/"
 
#   Set up Mastodon
mastodon = Mastodon(
    access_token = 'token.secret',
    api_base_url = 'https://botsin.space/'
)
# Post a blep
media = mastodon.media_post(basedir + "blep.jpg", description="Blep!")
mastodon.status_post("What a great blep!", media_ids=media)

Voilà! Running this python script makes a post to Mastodon. Notice that I’ve got the base directory explicitly named in the script. Eventually, cron is going to run this script with a very minimal set of environment variables, so if I forget to tell it where all the files are, it will error out.

Next, I wanted to add logging. There’s a good guide to the logging module on Real Python. To use the module, add “import logging” to the top of the script.

loglevel=logging.DEBUG
logging.basicConfig(level=loglevel,filename=basedir+"blepbot.log", filemode="w", format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', datefmt='%d-%b-%y %H:%M:%S')
logging.info('Running blepbot')

Now I needed to make the bot post a random image from a set of images, instead of always the same one! To get some test images, I gathered a bunch of blep pictures in a directory. I pulled down the first sixty results for “blep” from Google (image scraping Google search results is a whole other rabbit hole which I won’t go down here, since these aren’t the images I’ll be using for the actual bot) and stuck them in a directory called img/ under my basedir. To list the images and pick a random one, I’ll use os.listdir to list all the files in the img/ directory, and random.choice to pick one:

random_file=random.choice(os.listdir(basedir+"img"))

So I’ll need to also import random and os. The whole basic script is now:

from mastodon import Mastodon
import os, random,logging
 
#base directory
basedir="/home/nroshak/blepbot/"
#   Set up Mastodon
mastodon = Mastodon(
    access_token = basedir+'token.secret',
    api_base_url = 'https://botsin.space/'
)
loglevel=logging.DEBUG
logging.basicConfig(level=loglevel,filename=basedir+"blepbot.log", filemode="a", format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', datefmt='%d-%b-%y %H:%M:%S')
logging.info('Running blepbot')
random_file=random.choice(os.listdir(basedir+"img"))
img_file=basedir+"img/"+random_file
logging.debug('chose file '+img_file)
media = mastodon.media_post(img_file, description="Blep!")
mastodon.status_post("Daily blep", media_ids=media)

This actually works great, and I could have stopped here and had a working bot. But there’s a problem: I don’t have attribution for any of these images. I just pulled them down from Google without checking their licenses, or verifying that the sites that posted them actually had the right to post them, etc.

I didn’t feel great about that. For one thing, this set of images could include someone’s dead pet; blepbot’s purpose is to bring a little joy to people’s timelines, and seeing your dead pet show up uncredited on your timeline wouldn’t exactly be a joyful moment.

Plus, I wanted blepbot to be able to post my friends’ pets’ bleps and credit them for those pictures. So I needed a way to attach credit to each image. I decided to create custom post text for each image, which lets me include appropriate tags like “#CatsOfMastodon” and the blepping animal’s name, as well as credit.

Since I was going to be hand-populating the post text, I knew I’d be dealing with a pretty small set of images: probably a couple of hundred, enough to toot one a day for a year without repeats. Rather than setting up a database for such a small number of files, I decided to go with a quick and dirty route: next to the img/ directory, I placed a txt/ directory to hold one file of post text for each image file. The post text for each image is “image name”.txt. For example, “img/crazy.jpg” has accompanying text “txt/crazy.jpg.txt” . I got rid of all the scraped test images in img/, then hand-populated the img/ and txt/ directories with 50+ hand-selected bleps and accompanying post texts.

To get the post text, all I have to do is build the text file name from the image file name, open it, and read its contents into a variable, which I’ve very creatively called post_text. (GeeksForGeeks has a tutorial on reading from a file in Python here) Then I can make the post using post_text instead of “Daily blep” .

post_textfile=basedir+"txt/"+random_file+".txt"
media = mastodon.media_post(img_file, description="Blep!")
logging.debug ('found post text file '+post_textfile)
file1=open(post_textfile,"r")
post_text=file1.read()
file1.close()
mastodon.status_post(post_text,media_ids=media)

However, I’m going to mess up at some point and mis-name one of these text files, so I’d better check to make sure the file actually exists! If not, I’ll fall back to “Daily blep” for the post text and log an error:

if os.path.isfile(post_textfile):
    logging.debug ('found post text file '+post_textfile)
    file1=open(post_textfile,"r")
    post_text=file1.read()
    file1.close()
    mastodon.status_post(post_text,media_ids=media)
else:
    logging.error ('**** post text file '+post_textfile+' not found!')
    mastodon.status_post("Daily blep", media_ids=media)

And that’s it! The bot is done. All that’s left is to create a script to set up the correct Python environment and run it. I used virtualenv to set up my environment (there’s a very-easy-to-follow tutorial on python-guide: https://docs.python-guide.org/dev/virtualenvs/#lower-level-virtualenv), so setting up the environment in a script is very simple:

#!/bin/bash
 
source $HOME/blepbot/venv/bin/activate
python3 $HOME/blepbot/blepbot.py

I called this script “run_bot.bash”. I’m in a Linux environment, so all I had to do to get it to run daily was to add it to my user crontab:

$ crontab -l
SHELL=/usr/bin/bash
# run blepbot every day at 1:10 pm
10 13 * * * $HOME/blepbot/run_bot.bash >> $HOME/blepbot/out.txt 2>&1

Blepbot is now live and blepping on Mastodon at https://botsin.space/@blepbot . Follow Blepbot for your daily blep.

Possible Improvements:

  • Right now the alt text for each image is “Blep!” This isn’t a great description, and it’s a shame to lose the very descriptive alts from the images that Mastodon users have given me permission to share with Blepbot. It wouldn’t be hard to add an “alt/” directory for alt texts, and default to “Blep!” when no alt text is found.
  • Error handling – I did add error handling to do a simple pause-and-retry if the post failed, but Mastodon.py has a bunch of error classes I haven’t really gotten into.
  • I also added a “dev/prod” toggle, to let me try new features by making private test posts.
  • Mastodon has some fun day-specific tags for animal pics, like #TonguesOutTuesday, #WhiskersWednesday, etc., which I could append to the post text on the appropriate days.
Categories: DBA Blogs

AI Image Denoising

Thu, 2023-06-29 21:28

I tried some free AI Image Denoising tools on a noisy, low-light photo. I had big hopes for a very sweet picture that’s just too noisy to put in a frame. Unfortunately, free tools didn’t get me anywhere.

ImgLarger
http://imglarger.com
Result: I can’t tell the difference between the original and the supposedly “denoised” image. 0/10 might as well not have bothered.

DeepAI Super Resolution model
https://deepai.org/machine-learning-model/torch-srgan
using Super Resolution torch-srgan
Result: The image was noticeably denoised, but the result wasn’t very impressive.
Following enhance, the quality was still poor.

BigJPG
https://bigjpg.com
The free denoising is limited to 3000×3000 px, so I had to crop the picture.
Result: Another poor quality, unimpressive result.

https://deepai.org/machine-learning-model/image-editor
After a series of unsatisfying results, I told it “do nothing”. and it closed the eyes and added some wrinkles. ooookay.
The prompt “watercolour painting, pastel colours, photorealistic, detailed image” was better but still mangled the face a bit horribly, and this algorithm seems really determined to make smiling faces squint.

Categories: DBA Blogs

Aranet 4 review

Tue, 2023-03-28 11:33

I recently bought an aranet 4 from Canadian Tire, of all places. Here’s my review.

TL;DR: It’s calibrated, convenient, and easy to use, and your indoor CO2 readings may surprise you.

What is it for?

The Aranet 4 monitors CO2, temperature, and humidity. It uses a non-dispersive infrared sensor (NDIR) to measure the ambient CO2 in parts per million (ppm). From the user manual:

Diagram of NDIR sensor

Why does that matter? 1. Thinking:

The outdoor CO2 concentration is around 400-450 ppm (for now…). Inside, with the windows closed, CO2 concentrations can rise rapidly, and high CO2 concentrations actually affect our activity levels and cognition, especially in the domains of information usage and strategy. From a study of office workers exposed to low, medium, and high CO2 concentrations (doi:10.1289/ehp.1510037):

Effects of CO2 concentration

Many countries, including Canada, have issued guidelines in the 600-1000ppm range for indoor CO2 concentrations, and ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) recommends keeping indoor CO2 levels to no more than 700 ppm above outdoor levels (so, 1100-1200 ppm).[source]

2. Health:

CO2 concentrations can be used as a rough proxy for clean air. Many illnesses spread through the air, either through droplet or aerosol spread. If indoor CO2 concentrations are low, despite everyone in the room busily metabolizing and breathing out CO2, that means there’s excellent ventilation, and therefore a much lower chance that the air you breathe in has been breathed (or sneezed!) out by somebody else. However, if CO2 concentrations are high, that doesn’t necessarily mean the air isn’t clean. A HEPA filter will remove bacteria, viruses and dust from the air, but will not lower the CO2. So, the higher the indoor CO2 concentration, the more important it is to make sure those HEPA filters are on.

Canadian Tire? Seriously?

Yes! Canadian Tire’s goods mostly range from dollar-store quality to “decent, but far from the best,” so it was a huge surprise to find an Aranet 4 for sale there for less than it would cost to buy it direct & pay for shipping. I bought one online and picked it up the next day.

The review 1. It works

Insert batteries, bam! it boots and works. Very easy setup. It even ships with two batteries.
One caveat: the first reading I got after putting the batteries in was whack, over 1000ppm higher than subsequent readings.

2. It’s properly calibrated straight from the factory

I checked the calibration on the Aranet 4 by taking it outside. I got a CO2 reading of 445 ppm, which is both in the expected range and accords with the 440ppm I got earlier with another CO2 sensor.

3. Setup with the phone app was easy

The bad news: Although a 7-day history of CO2, temp and humidity readings is stored on the sensor, you need the app to actually see that history, and to configure basic functionality like the monitoring interval, temperature scale (C or F), and buzzer (audio alert if CO2 goes over 1500ppm).

The good news: App setup is easy. Install phone app, make sure Bluetooth is on, pair with Aranet 4 in the app, done.

4. Monitoring interval

The default interval between readings is 5 minutes, which is fine for sitting around, but annoying when you’re trying to test it. The only way to reset the monitoring interval is through the app. Options are every 1, 2, 5 or 10 minutes.

5. Leave that bedroom door cracked!

We settled down in a small room with the door closed to watch a movie. I left the Aranet on, and checked the graph at the end. Between 9:30 and 11:30, CO2 rose from 550 ppm to almost 1800 ppm — enough to affect our cognitive function.

Graph showing CO2 peak

Then, I brought the Aranet into the bedroom, but left the door cracked open an inch. Between 11:45 pm and 2:30 am, CO2 rose to 1489 ppm, with a few marked peaks and valleys, then sawtoothed back down to 1284 ppm at 7:30 am. I’m guessing that the peaks and valleys have something to do with the forced-air heat going on and off.

Overnight CO2 readings

6. CO2 readings are fast, temp readings are slow

Taking the Aranet4 from one room to a different one, or from inside to outside, shows up immediately in the CO2 readings. The temperature and humidity readings are slower to update. For example, when I brought the Aranet4 from my 20-degree (Celsius) house to the 5-degree outside, the CO2 reading immediately showed 445 ppm, but the temp reading was 12 degrees. So, don’t expect instant-read-thermometer speed from the Aranet 4.

7. Mysterious firmware update

The Aranet 4 shipped with firmware version v1.1.2, which was out of date when I got it. There was zero indication that it was out of date, unfortunately. In “Device Settings”, if I clicked on the firmware version (which wasn’t flagged in any way), only then I would find out that a more recent version was available.

Eventually, I stumbled across this and decided to update the firmware to the current version, 1.2.1. This was as simple as putting my phone next to the Aranet, clicking “Upgrade[sic] firmware”, and not touching anything until it was done. No release notes seem to be available anywhere, so I have no idea if any bugs were fixed or why the update was released. All I’ve got is that it still seems to work post update.

8. Personal vs. pro

The Aranet 4 comes in two versions which are visually indistinguishable: Personal and Pro. The Canadian Tire website doesn’t say which one I got, but I’m reasonably sure it’s the personal one. The only difference seems to be that the “pro” version can connect to a (separately available) wireless base station, which can accept connections from multiple Aranet Pros. For home users, the phone app’s ability to connect to up to 6 Aranet Personals is probably more than sufficient.

Verdict: Small, portable, easy and convenient

The Aranet 4 is small and simple enough I could take it out to scout CO2 in restaurants, theatres or museums, or send it with my kid to school and then check the day’s CO2 graph when the kid gets home (historical data is stored on the sensor). No major headaches, it’s accurate, and it works. 10/10 would buy again.

User manual

https://cdn.bfldr.com/FS48XT6B/at/vzp23x5qbsfk7t5t3vp8n/Aranet4_User_Manual_v25_WEB.pdf

Coming soon: side-by-side readings

The next step is to take several side-by-side readings with the aranet4 and another, less portable CO2 sensor, in various areas.

Categories: DBA Blogs

Read “The Auditor and the Exorcist” free online

Mon, 2021-03-22 09:24

In 2019, I posted here about a novelette I wrote for TechDirt. The people behind TechDirt had put together an anthology of stories about the future of work, Working Futures. I’ve since put that novelette, “The Auditor and the Exorcist,” up on my writing site as a free read.

“The Auditor and the Exorcist” was heavily influenced by the years I spent working remotely for Oracle. It tells a near-future story of working in a world with a social credit system that depends on weak AI. Of course, poor security, hacking, and bugs are all present in the future, too…

In 2020, I released “The Auditor and the Exorcist” for free under CC-BY-NC-ND 4.0 on my writing website, both as HTML and as a free EPUB download. http://nrmroshak.com/the-auditor-and-the-exorcist/

Image credit: Remixed from a public domain image found on publicdomainimages.net. Feel free to reuse/remix further.

Categories: DBA Blogs

Using GPT-2 as a continuous writing prompt generator via AI Dungeon

Tue, 2020-08-25 14:09

OpenAI has released an API for accessing GPT-3, an AI text generator that’s capable of generating much longer responses than GPT-2. The API is currently in closed beta, and while I’ve requested access, I haven’t gotten it (yet?) But, I learned from Mario Dian’s blog about AI Dungeon, a text-based dungeon game powered by GPT-2 and/or GPT-3. Imagine a text-based adventure game, like Zork–that’s the feel of AI Dungeon, but instead of being programmed behind the scenes by something like ChoiceScript, it’s using GPT-2 (free version) or GPT-3 (paid version) to generate text.

I’m interested in exploring GPT-2 and GPT-3 for fiction generation, so I gave the free version of AI Dungeon a try as I was working on a short piece of flash fiction. I didn’t want AI Dungeon to write a story for me, or even to write parts of a story. I wanted to use it as a tool to spur my own creativity.

Getting Started

In AI Dungeon’s single-player mode, you have a choice of settings such as fantasy, mystery, zombies, etc; choosing “other” lets you prime the AI with a setting of your own. I wanted to work on a retelling of the Black Airts folktale, which is set in an eeevil school of eeevil magic, so I chose “other”.

I started by saving a few details in the “What should the AI remember” section. Then I primed the AI with this text:

You are a student in the school under the mountain. You haven’t seen the sun since the day seven years ago you passed through the school’s doors. You’ve long since washed away the blood you shed to open those doors: not washed clean, but washed away by the blood of a hundred other rites, each worse than the last.

The AI came back with several lines I didn’t like. I kept pressing “try again” until it generated this line:

The Devil has always been with you. He is your teacher and your master. The only thing that separates you from him is time… and death.

I really liked this! I hadn’t planned to put the Devil in the teacher’s role, but this sentence worked so well that I went with it. This turned out to be the only line the AI generated that I would accept “as is”. I followed on with:

Each day begins with a sumptuous feast, if you can conjure it from the worms and scorpions that appear on your plate. The youngest students often go hungry. The weakest become your sacrifices.

I sent this to the engine. After hitting reload a few times, the AI came out with:

Your days pass in a cycle of sacrifice, ritual, and ecstasy.

Not bad. I took this as a starting point for the next paragraph, modified it, and wrote:

Your days pass in a cycle of ritual and sacrifice. You learn, and you fight to be the one that wields the Black Arts, not the one they are wielded against. Always the enchanter, never the sacrifice.

But this morning, your plate is empty.

A quick glance around you shows that your fellow seniors’ plates are bare as well. There are only seven of you who have made it this far. Seven enchanters. You don’t wish to count the sacrifices.

This was a good example of the AI acting as a spur for my own creativity. “Ecstasy” didn’t fit the story, but I liked the emphasis on ritual–which the AI came up with by itself, based on a setting that included “sacrifice” and “rite” but not “ritual”.

The next few texts the AI generated were less on-point. I wound up combining two different generated texts to make the next section. This was the rule for the rest of the story. For example, I combined these two generated texts:

There is a mad scramble for the door. The last student desperate to be the first to reach the open air.

and

You lunge for the exit. Your friends scramble after you, but it is too late. The Devil swivels around, and his fingers close around the last student’s throat.

into the story paragraph:

There is a mad scramble for the door. Each new enchanter is desperate to be the first to reach the world above. You lunge for the exit with the rest, but too slowly. The Devil swivels around, and his fingers close on your shoulder.

The ending of the story required a change in tense, so I abandoned AI Dungeon to write it.

Conclusion: AI Dungeon as continuous writing prompt generator

I essentially used AI Dungeon’s free mode to help the words come. The generated text was mostly a little off from where I wanted to go with the story, but that worked well for me. Reading something that’s not quite right focused me on what would be right, and helped the words flow.

It felt very much like using a fiction prompt generator, where you keep pressing “refresh” until you find a prompt that sparks something–except in this case, the prompts were useful at the paragraph level, not just the story level, and were more-or-less informed by the earlier part of the story.

As a result, I wrote this 600-word story in about half the time it would usually take me, with much less frustration.

I’m pretty pleased with how the story came out. AI Dungeon’s free, GPT-2 engine (“Griffin”) was a good writing tool for me in this specific situation. Some of the specifics that made this work:

  • I knew the story I wanted to tell.
  • I picked a setting from folkloric tradition, and AI Dungeon was able to pick up on elements that go with that setting without being told, like “ritual”.
  • I wanted to tell a very short, one-scene story, with few characters.
  • I wanted to tell it in 2nd person present (as AI Dungeon suggests, it works best in this mode).
Next up: Dragons?

I’d like to try GPT-3 for text generation. I’m on the waitlist for the GPT-3 beta, but failing that I could try AI Dungeon’s paid mode, which uses GPT-3 (their “Dragon” model).

Afterthoughts: Who owns the story that I wrote with AI Dungeon’s help?

I only wound up using a few sentences generated by the software; I mostly used the AI-generated text as starting points that I rewrote. If the AI was a human writing partner, we’d call this my story. If the AI was a traditional fiction prompt generator, we wouldn’t even mention it!

But what rights did I grant AI Dungeon to the stuff I typed into it while interacting with its AI? As far as I can tell, only the rights to display it back to me, but IANAL (I am not a lawyer). Here’s an excerpt from the aidungeon.io TOS as of the day I wrote the story:

“Your Content License Grant. In connection with your use of the Services, you may be able to post, upload, or submit content to be made available through the Services (“Your Content”). In order to operate the Service, we must obtain from you certain license rights in Your Content so that actions we take in operating the Service are not considered legal violations. Accordingly, by using the Service and uploading Your Content, you grant us a license to access, use, host, cache, store, reproduce, transmit, display, publish, distribute, and modify (for technical purposes, e.g., making sure content is viewable on smartphones as well as computers and other devices) Your Content but solely as required to be able to operate and provide the Services. You agree that these rights and licenses are royalty free, transferable, sub-licensable, worldwide and irrevocable (for so long as Your Content is stored with us), and include a right for us to make Your Content available to, and pass these rights along to, others with whom we have contractual relationships related to the provision of the Services, solely for the purpose of providing such Services, and to otherwise permit access to or disclose Your Content to third parties if we determine such access is necessary to comply with our legal obligations. As part of the foregoing license grant you agree that the other users of the Services shall have the right to comment on and/or tag Your Content and/or to use, publish, display, modify or include a copy of Your Content as part of their own use of the Services; except that the foregoing shall not apply to any of Your Content that you post privately for non-public display on the Services. By posting or submitting Your Content through the Services, you represent and warrant that you have, or have obtained, all rights, licenses, consents, permissions, power and/or authority necessary to grant the rights granted herein for Your Content. You agree that Your Content will not contain material subject to copyright or other proprietary rights, unless you have the necessary permission or are otherwise legally entitled to post the material and to grant us the license described above.”

If I’d let AI Dungeon write an entire story for me, after giving it an intial paragraph, who would own the rights to that story? I’m utterly unqualified to answer this question, but as best as I can tell, it depends on your country.
Katharine Stephens, a partner at the UK firm of Bird & Bird, says, “In most countries, if no human author can be identified for the work, no copyright will subsist in it and therefore it will fall into the public domain,” but in the UK, computer-generated works for which there is no human author are considered to have been written by the person “by whom the arrangements necessary for the creation of the work are undertaken.” The answer seems much less clear-cut for the US and Canada.

Categories: DBA Blogs

Scraping web data

Wed, 2020-05-20 20:36

I wanted to get some data off an agent listing website and into a spreadsheet. I’d been meaning to play around with python for web scraping and this was the perfect excuse: There were just enough results that it would take longer to manually copy and paste them than to write a little python program. (I never want to automate something that will take less time to do than to automate, as long as I’m only going to do it once or twice…)

To get and post the search form, I used requests rather than urllib because dang! is it ever easier to work with. The requests session kept track of the requisite asp.net sessionID cookie without a single line of code on my part. I used BeautifulSoup to process the HTML result pages, and it was fun, if counterintuitive.

# import libraries
import requests
import re
from bs4 import BeautifulSoup
 
# the advanced search for agentquery.com is
# https://agentquery.com/search_advanced.aspx
# it's an aspx page which requires the session cookie to be set
# so we'll make a requests session
s = requests.Session()
 
# first HTTP request without form data 
# get the form to set session cookie and get some hidden form values
myurl = 'https://agentquery.com/search_advanced.aspx'
f = s.get(myurl)
 
# parse and retrieve three vital form values
soup = BeautifulSoup(f.text)
viewstate = soup.select("#__VIEWSTATE")[0]['value']
viewstategenerator = soup.select("#__VIEWSTATEGENERATOR")[0]['value']
eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']
 
# fill the form data 
 
# Here are the boxes I want checked:
# ctl00$chkFiction$15 is Middle Grade
# ctl00$chkFiction$22 is Science Fiction
# ctl00$chkFiction$8 is Fantasy
 
# ctl00$btnSearch is the search button, must set to 'Search' for the POST to return results
# ctl00$drpSeek is the drop-down for "ARE YOU LOOKING FOR AN AGENT WHO IS ACTIVELY SEEKING NEW CLIENTS?"
 
mypayload = {
    '__EVENTVALIDATION': eventvalidation,
    '__VIEWSTATE': viewstate,
    '__VIEWSTATEGENERATOR':viewstategenerator,
    'ctl00$chkFiction$15': 'on',
    'ctl00$chkFiction$22': 'on',
    'ctl00$chkFiction$8': 'on',
    'ctl00$btnSearch': 'Search',
    'ctl00$drpSeek': 'Yes'
}
 
# Now we can make the second HTTP request with form data
# this gets the first page of results
f = s.post(myurl,mypayload)
 
# open output file for writing
try:
    file = open('tmp.csv', 'w')
except:
    print('Could not open output file\n')
 
getmore='true'
 
while(getmore=='true'):
 
  # parse the html 
  soup = BeautifulSoup(f.text)
 
  # sift out the agent data from this page
  results=soup.find_all(id=re.compile("dlResults_ctl.*(lnkAgent$|lnkAgency|lblEmail)"))
 
  # example output:
  # <a class="result" href="agent.aspx?agentid=1128" id="ctl00_dlResults_ctl00_lnkAgent">Suzie Townsend</a>
  # <a class="result" href="http://www.publishersmarketplace.com/members/sztownsend81//" id="ctl00_dlResults_ctl00_lnkAgency" target="_blank" rel="noopener noreferrer">New Leaf Literary and Media</a>
  # <span id="ctl00_dlResults_ctl00_lblEmail">query@newleafliterary.com, put QUERY--SUZIE in the subject line</span>
 
  for i in range(0,len(results),3):
 
    a_agentid=results[i].get_attribute_list('id')[0]
    a_agentlink=results[i].get_attribute_list('href')[0]
    a_agentname=results[i].text
    a_agencyid=results[i+1].get_attribute_list('id')[0]
    a_agencyurl=results[i+1].get_attribute_list('href')[0]
    a_agencyname=results[i+1].text
    a_email=results[i+2].text
 
    # the url may be blank
    if a_agencyurl==None:
    	a_agencyurl=''
    # create a row, values delimited by "|"
    row=a_agentname + "|https://agentquery.com/" +  a_agentlink + "|" + a_agencyname + "|" + a_agencyurl + "|" + a_email
    # print to screen
    print(row)
    # print to file
    file.write(row+"\n")
 
  # is there a next page? if so, load it. If not, stop.
  # the "Next" link is present only if there are more results & has id 'ctl00_Pager1_lbtnNext'
  n=soup.find(id='ctl00_Pager1_lbtnNext');
  if n==None:
  	getmore='false'
  elif n.text=='Next':
    myurl="https://agentquery.com/" + n.get_attribute_list('href')[0]
    f = s.post(myurl,mypayload)
  else:
  	getmore='false'
 
#while loop ends here 
file.close()
Categories: DBA Blogs

Service Battery warning on macOS

Mon, 2020-05-11 14:23

Scene: INT. LATE NIGHT.
NATALKA is typing on a MACBOOK AIR.
Suddenly, the screen goes dark. Silence. The MACBOOK AIR has shut down with no warning.

Once upon a time, my laptop was well-behaved. It warned me politely when it was nearly out of power and shutdown was imminent.
Low Battery. Your Mac will sleep soon unless plugged into a power outlet.Image ganked from someone who still gets these warnings.

But sometime in the last six years, my MacBook lost the ability to keep track of its own battery life. Oh, it still reports a percent charge in the battery status menu, but this number seems to be fictional: at somewhere between 30% and 20%, the computer actually hits 0 and shuts down hard. The battery’s so drained that I have to reset the time and date on restart.

The OS is dimly aware that something’s wrong. The battery status menu no longer attempts to estimate how much time I have left. Instead, it warns: “Service Battery”.

Per the macOS User Guide:

“Service Battery: The battery isn’t functioning normally, and you may or may not notice a change in its behavior or the amount of charge it holds. Take your computer in for service. You can continue to use your battery before it’s checked without harming your computer.”

Well. Not only am I not inclined to take in my laptop without any further effort, but there’s nowhere to take it were I so inclined, the Apple Store in the mall not being an essential service. Time to do some digging.

Apple estimates that my particular battery should have about 1000 full cycles in it. I checked the number of full cycles in System Information: under 400. (System Information > Hardware > Power > Cycle Count.) So that by itself shouldn’t be the problem.

CoconutBattery readoutCoconutBattery readoutI first tried resetting the SMC to see if it would make the problem would go away. The SMC, or system management controller, manages the battery. Resetting is as simple taking out the battery, if it’s removable, or holding down cmd-option-shift-power while the computer is off, if it’s not.

This was by far the easiest fix to try, so of course, it made no difference. The “Service Battery” warning was still there and the computer was still suddenly going dark at around 25% supposed battery life.

Next, I downloaded coconutBattery to give me more insight into the problem. This cute little app gave me some bad news: my battery capacity is down to 60% of its design capacity. So, that’s not good. I should plan to replace the battery, either myself or by taking it to an Apple service desk (when they reopen). But in the meantime, I’d like it if the OS could recognize the battery’s reduced capacity and warn me to plug it in before crash.

I followed the battery recalibration steps in this TechJunkie post: charge to 100%, run while plugged in, unplug, run til it dies, leave unplugged overnight, charge to 100%. The OS should now have a better idea of when the battery’s about to die. I’m currently running cordless to drain the battery, waiting with great anticipation for my first sighting of the “Low Battery Warning” in 2020.

If I don’t get the low battery warning, there’s a workaround: the paid version of Coconut Battery lets you set custom alerts based on battery percentages. If the OS is incapable of keeping track of the battery life, I can use Coconunt to manually set a warning threshold a little bit higher than whatever I noticed the battery was down to last time it crashed.

Categories: DBA Blogs

The future of work

Wed, 2019-10-02 10:48

The people behind the American tech industry blog TechDirt have put together an anthology of stories about the future of work called (appropriately) Working Futures, and I’m excited to have a story in it.

My story, “The Auditor and the Exorcist,” was heavily influenced by the years I spent working remotely for Oracle. It tells a near-future story of working in a world with a social credit system that depends on weak AI. Of course, poor security, hacking, and bugs are all present in the future, too…

Quick summary: Pat is stuck in a soul-deadening job as a social credit auditor. Her thoroughly modern home suddenly shows every sign of being haunted. Pat doesn’t believe in ghosts, but the only thing that seems to restore her home to normalcy is the help of an online exorcist. Is Pat’s house really haunted… or is something more sinister going on?

The book is now available in both ebook and paperback format at Amazon, and if you have a Kindle Unlimited subscription, you can read it for free. Plus, you can repost any of the stories in it (non-commercially): The stories in the anthology are all released under CC license. My story is released as CC-BY-NC-ND, meaning anyone’s free to recopy it non-commercially, but not to re-release it commercially or to remix without permission. to recopy it non-commercially, but not to re-release it commercially or to remix without permission.

Image credit: Remixed from a public domain image found on publicdomainimages.net. Feel free to reuse/remix further.

Categories: DBA Blogs

Jami (Gnu Ring) review

Tue, 2019-09-10 15:35

An unavoidable fact of database support life is webconferences with clients or users. Most of the time, we’re more interested in what’s going on onscreen than in each others’ faces. But every now and then we need to have a face-to-face. Skype is popular, but I recently had the chance to try out a FOSS alternative with better security: Jami.

Jami (formerly Gnu Ring) is a FOSS alternative to Skype that advertises a great featureset and some terrific privacy features. I suggested to a small group that we try it out for an upcoming conference call.

Just going by its specs, Jami (https://jami.net/) looks amazing. It’s free, open-source software that’s available on all the major platforms, including all the major Linux distros. It boasts the following advantages over Skype and many other Skype alternatives:

  • Distributed: Uniquely, there aren’t any central servers. Jami uses distributed hash table technology to distribute directory functions, authentication, and encryption across all devices connected to it.
  • Secure: All communications are end-to-end encrypted.
  • FOSS: Jami’s licensed under a GPLv3+ license, is a GNU package and a project of the Free Software Foundation.
  • Ad-free: If you’re not paying for commercial software, then you are the product. Not so with Jami, which is non-commercial and ad-free. Jami is developed and maintained by Savoir Faire Linux, a Canadian open-source consulting company.

And its listed features include pretty much everything you’d use Skype for: text messaging, voice calls, video calls, file and photo sharing, even video conference calls.

I wanted to use it for a video conference call, and my group was willing to give it a try. I had high hopes for this FOSS Skype alternative.

Installation

Jami is available for: Windows, Linux, OS X, iOS, Android, and Android TV. (Not all clients support all features; there’s a chart in the wiki.) I tried the OS X and iOS variants.

First, I installed Jami on OS X and set it up. The setup was straightforward, although I had to restart Jami after setting up my account, in order for it to find that account.

Adding contacts

One particularly cool feature of Jami is that your contact profile is stored locally, not centrally. Your profile’s unique identifier is a cumbersomely long 40-digit hexadecimal string, such as “7a639b090e1ab9b9b54df02af076a23807da7299” (not an actual Jami account afaik). According to the documentation, you can also register a username for your account, such as “natalkaroshak”.

Contacts are listed as hex strings.Unfortunately, I wasn’t able to actually find any of my group using their registered usernames, nor were they able to find me under my username. We had to send each other 40-digit hex strings, and search for the exact hex strings in Jami, in order to find each other.

The only way to add a contact, once you’ve located them, is to interact with them, eg. by sending a text or making a call. This was mildly annoying when trying to set up my contact list a day ahead of the conference call.

Once I’d added the contacts, some of them showed up in my contact list with their profile names… and some of them didn’t, leaving me guessing which hex string corresponded to which member of my group.

Sending messages, texts, emojis

Sending and receiving Skype-style messages and emojis worked very well in Jami. Group chat isn’t available.

Making and taking calls

The documented process for a conference call in Jami is pretty simple: call one person,

Only the Linux and Windows versions currently support making conference calls. Another member of our group tried to make the conference call. As soon as I answered his incoming call, my Jami client crashed. So I wasn’t able to actually receive a call using Jami for OS X.

The caller and one participant were able to hear each other briefly, before the caller’s Jami crashed as well.

Linking another device to the same account

I then tried installing Jami on my iPhone. Again, the installation went smoothly, and this let me try another very cool feature of Jami.

In Jami, your account information is all stored in a folder on your device. There’s no central storage. Password creation is optional, because you don’t log in to any server when you join Jami. If you do create a password, you can (1) register a username with the account and (2) use the same account on another device.

The process of linking my iPhone’s Jami to the same account I used with my OSX Jami was very smooth. In the OSX install, I generated an alphanumeric PIN, entered the PIN into my device, and entered the account password. I may have mis-entered the first alphanumeric PIN, because it worked on the second try.

Unfortunately, my contacts from the OSX install didn’t appear in the iOS install, even though they were linked to the same account. I had to re-enter the 40-digit hex strings and send a message to each conference call participant.

Making calls on iOS

The iOS client doesn’t support group calling, but I tried video calling one person. We successfully connected. However, that’s where the success ended. I could see the person I called, but was unable to hear her. And she couldn’t see OR hear me. After a few minutes, the video of the other party froze up too.

Conclusion

Jami looked very promising, but didn’t actually work.

All of the non-call stuff worked: installation, account creation, adding contacts (though having to use the 40-digit hex codes is a big drawback), linking my account to another device.

But no one in my group was able to successfully make a video call that lasted longer than a few seconds. The best result was that two people could hear each other for a couple of seconds.

Jami currently has 4.5/5 stars on alternativeto.net. I have to speculate that most of the reviews are from Linux users, and that the technology is more mature on Linux. For OSX and iOS, Jami’s not a usable alternative to Skype yet.

Big thanks to my writing group for gamely trying Jami with me!

Categories: DBA Blogs

Blogs to read in the Oracle DBA/Developer world

Sat, 2016-10-29 16:23

:Earlier this month, I conducted a totally unscientific survey on Twitter, asking where people got their Oracle news from. Twitter and the NoCOUG Journal were two popular sources, along with a wide range of blogs. Here are some of the blogs that the Oracle DBA & Dev superstars in my Twitter like to read:

BTW, I notice that David Fitzjarrell’s blog was named one of the top 50 SQL blogs in 2016 by Ben’s DatabaseStar blog, itself a good blog with a strong focus on Oracle SQL for developers. I recommend checking out Ben’s list for more SQL blogs!

I also learned about a couple of terrific Oracle blog aggregators:

I’ve got a couple of my own to add to the list.

I also follow the articles on orafaq.com:
http://www.orafaq.com/articles

And there are some official Oracle blogs I like to keep my eye on:

And of course, there’s Ask Tom, which has an RSS feed for newest questions.

I’m experimenting with Feedly as my new RSS reader. Do you have a favorite Oracle blog, magazine or RSS reader? Let me know in the comments!

Categories: DBA Blogs

HOWTO solve any problem recursively, PL/SQL edition…

Mon, 2016-06-20 17:47
PROCEDURE solve (my_problem IN varchar2) IS
BEGIN
  my_idea := have_great_idea (my_problem) ;
  my_code := start_coding (my_idea) ;
  IF i_hit_complications (my_idea)
  THEN 
    new_problem := the_complications (my_idea);
    solve (new_problem);
  ELSE
    NULL; --we will never get here
  END IF;
END solve;

This abuse of recursion was inspired by @ThePracticalDev !

Categories: DBA Blogs

What’s in a name? – “Brittany” edition

Mon, 2016-06-20 07:46

In my last post, I loaded US SSA names data into my dev instance to play with. In this post, I’ll play around with it a bit and take a look at the name “Brittany” and all its variant spellings.

I found nearly 100 different spellings of “Brittany” in the US SSA data thanks to a handy regexp:

SELECT name nm, SUM(freq) FROM names 
 WHERE regexp_like(UPPER(name),'^BR(I|E|O|U|Y)[T]+[AEIOUY]*N[AEIOUY]+$' )
 AND sex='F'
GROUP BY name
ORDER BY SUM(freq) DESC;
NM				SUM(FREQ)
------------------------------ ----------
Brittany			   357159
Brittney			    81648
Britney 			    34182
Brittani			    11703
Britany 			     6291
Brittni 			     5985
Brittanie			     4725
Britni				     4315
Brittny 			     3584
Brittaney			     3280
...
Bryttnee			       10
Britttany				7
Brytanie				7
Brittanae				6
Bryttnii				6
...
Brittanii				5
Brittiana				5
 
91 rows selected.

The regexp isn’t perfect. It returns a few uncommon names which aren’t pronounced “Brittany”: “Brittiana”, “Brittiani”, “Britane”, “Brittina”, “Britanya”, “Brittine” – and one I’m not sure about, “Brittnae”. But on the other hand, it did let me discover that 7 “Britttany”s applied for SSNs in 1990. Yes, that’s “Britttany” with 3 “T”s.

Fortunately, all the “non-Brittanys” the regexp returns are quite uncommon and not even in the top 20. So the regexp will do for a graph of the top spellings. Let’s get the data by year and look at the percentage of girls in each year named Brittany/Brittney/Britney/Brittani:

WITH n AS (SELECT name nm, YEAR yr, sex, freq FROM names 
 WHERE regexp_like(UPPER(name),'^BR(I|E|O|U|Y)[T]+[AEIOUY]*N[AEIOUY]+$' )
 AND sex='F'),
y AS (SELECT  YEAR yr, sex, SUM(freq) tot FROM names GROUP BY YEAR, sex)
SELECT y.yr, 
decode(n.nm,'Brittany','Brittany', -- like Brittany Furlan
'Brittney','Brittney', -- like Brittney Griner
'Britney','Britney', -- like Britney Spears
'Brittani','Brittani', -- like Brittani Fulfer
'Other Brits') AS thename,
nvl(100*freq/tot,0) pct  FROM n, y 
WHERE  n.sex(+)=y.sex AND n.yr(+)=y.yr AND y.yr >= 1968
ORDER BY y.yr, nvl(n.nm,' ')

I graphed this in SQL Developer:
britts
From the graph it’s clear that “Brittany” is by far the most popular spelling, followed by “Brittney”. The sum of all Brittany-spellings peaked in 1989, but “Britney” has a sharp peak in 2000 – the year that singer Britney Spears released Oops I Did It Again, “one of the best-selling albums of all time” per Wikipedia.

This makes Brittany, however you spell it, a very early-90s-baby kind of name. “Brittany” was the #3 girls’ name in 1989, behind Jessica and Ashley, and was not nearly as popular in decades before or since. In subsequent posts I’ll look some more at names we can identify with specific decades.

Categories: DBA Blogs

What’s in a name? or rather, in the SSA Names data

Thu, 2016-06-09 14:31

One of the amazing things about being a DBA/developer in 2016 is the sheer amount of freely available, downloadable data to play with. One fun publicly available data sets is the American Social Security Administration names data. It contains all names for which SSNs were issued for each year, with the number of occurrences (although names with <5 occurrences are not included to protect individual privacy).

What’s so fun about this dataset?

* It’s already normalized

* It updates only once a year, and then only by adding another year’s worth of data, so it’s easy to keep current

* Almost everyone can relate to this dataset personally – almost everyone’s name is in there!

* At about 1.8 million rows, it’s not particularly large, but it’s large enough to be interesting to play with.

The one slight annoyance is that the data is in over 100 files, one per year: too many to load one-by-one manually. So here’s a blog post on loading it into your Oracle database, with scripts.

1. Visit the URL:
https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data

2. Download and unzip names.zip . This zip archive contains one file for each year from 1880 to 2015. The files are named yobXXXX.txt eg. yob2015.txt .

3. Create a table to hold the names data:

DROP TABLE names;
CREATE TABLE names (YEAR NUMBER(4), name varchar2(30), sex CHAR(1), freq NUMBER);

4. Load in one year to get a feeling for the data. Let’s load “yob2015.txt”, the most recent year.
Here’s a sql*loader control file “names.ctl” to load the data:

[oracle@localhost names]$ cat names.ctl
load data 
infile 'yob2015.txt' "str '\r\n'"
append
into table NAMES
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
           ( NAME CHAR(4000),
             SEX CHAR(4000),
             FREQ CHAR(4000),
             YEAR "2015"
           )

(By the way, here’s a great tip from That Jeff Smith: Use sql developer to generate a sql*loader ctl file. )
Now let’s use the ctl file to load it:

[oracle@localhost names]$ sqlldr CONTROL=names.ctl   skip=0  
Username:scott/********
 
SQL*Loader: Release 12.1.0.2.0 - Production on Thu Jun 9 10:41:29 2016
 
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
 
Path used:      Conventional
Commit point reached - logical record count 20
 
...
Table NAMES:
  32952 Rows successfully loaded.
 
Check the log file:
  names.log
for more information about the load.

5. Let’s take a look at the 2015 data! How about the top 10 names for each sex?

WITH n AS 
  ( SELECT name, sex, freq, 
  rank() OVER (partition BY sex ORDER BY freq DESC) AS rank_2015
  FROM names 
  WHERE YEAR=2015 )
SELECT * FROM n
WHERE rank_2015 < 11
ORDER BY sex, rank_2015;
NAME			       S       FREQ  RANK_2015
------------------------------ - ---------- ----------
Emma			       F      20355	     1
Olivia			       F      19553	     2
Sophia			       F      17327	     3
Ava			       F      16286	     4
Isabella		       F      15504	     5
Mia			       F      14820	     6
Abigail 		       F      12311	     7
Emily			       F      11727	     8
Charlotte		       F      11332	     9
Harper			       F      10241	    10
 
NAME			       S       FREQ  RANK_2015
------------------------------ - ---------- ----------
Noah			       M      19511	     1
Liam			       M      18281	     2
Mason			       M      16535	     3
Jacob			       M      15816	     4
William 		       M      15809	     5
Ethan			       M      14991	     6
James			       M      14705	     7
Alexander		       M      14460	     8
Michael 		       M      14321	     9
Benjamin		       M      13608	    10

6. Now let’s load the names data for the other 135 years.
First we’ll create a generic “names.ctl”:

$ cat names.ctl
load data 
infile 'yob%%YEAR%%.txt' "str '\r\n'"
append
into table NAMES
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
           ( NAME CHAR(4000),
             SEX CHAR(4000),
             FREQ CHAR(4000),
             YEAR "%%YEAR%%"
           )

Now we’ll write a small shell script to substitute %%YEAR%% for each year from 1880 to 2014, and load that year’s file.

$ cat names.sh
#!/usr/bin/bash
export TWO_TASK=orcl
for i in {1880..2014}
do
  echo "generating yob$i.ctl"
  sed s/%%YEAR%%/$i/g names.ctl > yob$i.ctl
  echo "loading yob$i"
  sqlldr username/password CONTROL=yob$i.ctl
  echo "done $i"
done
 
[oracle@localhost names]$ ./names.sh
... massive screen output...
 
[oracle@localhost names]$ grep "error" *.log
yob1880.log:  0 Rows not loaded due to data errors.
yob1881.log:  0 Rows not loaded due to data errors.
yob1882.log:  0 Rows not loaded due to data errors.
yob1883.log:  0 Rows not loaded due to data errors.
...
yob2012.log:  0 Rows not loaded due to data errors.
yob2013.log:  0 Rows not loaded due to data errors.
yob2014.log:  0 Rows not loaded due to data errors.

7. Now we can play with the data a bit!

Here’s a quick look at the popularity of 2015’s top girls’ names since 1880:

WITH n2015 AS 
  ( SELECT name, sex, freq, 
  rank() OVER (partition BY sex ORDER BY freq DESC) AS rank_2015
  FROM names 
  WHERE YEAR=2015 )
, y AS (SELECT  YEAR, sex, SUM(freq) tot FROM names GROUP BY YEAR, sex)
SELECT names.year, names.name, 100*names.freq/tot AS pct_by_sex
FROM n2015, y, names
WHERE n2015.name = names.name AND n2015.sex = names.sex
AND y.year = names.year AND y.sex=names.sex
AND n2015.rank_2015 < 11
AND y.sex='F'
ORDER BY YEAR, name;

I graphed this in SQL Developer. Click to embiggen:
2015_girls_allyrs

You can see that Emma, my grandmother’s name, is having a bit of a comeback but is nowhere near the powerhouse it was in the 1880s, when 2% of all girls were named Emma. (For the record, my grandmother was not born in the 1880s!)

My next post will look at the name Brittany and its variants.

Note: You can download the names.ctl and names.sh from github here.

Categories: DBA Blogs

“What do you mean there’s line breaks in the address?” said SQLLDR

Fri, 2016-05-20 19:55

I had a large-ish CSV to load and a problem: line breaks inside some of the delimited fields.

Like these two records:

one, two, "three beans", four
five, six, "seven
beans", "eight wonderful beans"

SQL Loader simply won’t handle this, as plenty of sad forum posts attest. The file needs pre-processing and here is a little python script to do it, adapted from Jmoreland91’s solution on Stack Overflow.

import sys, csv, os
 
def hrtstrip (inputfile,outputfile,newtext):
    print("Input file " + inputfile)
    print("Output file " + outputfile)
    with open(inputfile, "r") as input:
       with open(outputfile, "w") as output:
          w = csv.writer(output, delimiter=',', quotechar='"', 
quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
          for record in csv.reader(input):
             w.writerow(tuple(s.replace("\n", newtext) for s in record))
    print("All done")

Thanks to Jmoreland91 for this. If you use it, give him an updoot.

edit – Jason Bucata (@tech31842) tweeted me another StackOverflow with a number of scripts in assorted languages: http://stackoverflow.com/questions/33994244/how-to-remove-newlines-inside-csv-cells-using-regex-terminal-tools

Categories: DBA Blogs

Tip of the day: Always put this in your .bashrc

Mon, 2016-05-09 06:52

if you like to scp:

# If not running interactively, don't do anything
[[ $- == *i* ]] || return

Otherwise scp will fail without error – it’s a known bug.

Categories: DBA Blogs

SQL vs. Excel – Subgroup medians

Mon, 2016-05-02 19:34

Recently I ran across this post on how to do subgroup medians in Excel 2010. First you need to create a pivot table, then “do some copying and pasting and use a formula to make it happen”. In SQL you can do this with one command.

Suppose that you have the same table as the Excel article, something like this:

CREATE TABLE sampletab
(arrest_day_of_week varchar2(10), 
arrest_ts TIMESTAMP, 
fingerprint_ts TIMESTAMP, 
days_between NUMBER);

and you want to get the median value of days_between for each day of the week.

The steps in Excel apparently go like this:

  1. Create pivot table to get the means
  2. Copy and paste the column values from the pivot table
  3. For Sunday, create an IF formula to include a cell’s days_between in the median calculation only if the arrest_day_of_week for that row is Sunday
  4. Repeat for other six days

Ouch!

In SQL, it’s one simple statement:

SELECT arrest_day_of_week, median(days_between) AS median_days_between FROM sampletab
GROUP BY arrest_day_of_week;

Conclusion – if you’re into data analysis, SQL can be a big time-saver!

Categories: DBA Blogs

Got anyone who needs April Fooling?

Thu, 2016-03-31 20:15

Do you have a sql*plus user who really needs an April Fool’s joke played on them? With a little editing to their glogin.sql, every sql*plus session will exit with what appears to be a pseudo-random TNS error.

(Note: assumes a *nix environment that has sed, grep, awk installed and oerr properly working.)

[oracle@localhost ~]$ cd $ORACLE_HOME/sqlplus/admin
[oracle@localhost admin]$ mv glogin.sql glogin.old
[oracle@localhost admin]$ vim glogin.sql
 
--Added by APRIL FOOL
 
set echo off
set feedback off
set verify off
set head off
col the_err new_value myerrno
col the_user new_value uname
set termout off
select 
'125' || decode( to_char(sysdate,'SS'),'01','61','06','64','07',
'66','17','69','30','70','53','71','59','82', to_char(sysdate,'SS')) 
as the_err
from dual;
select user as the_user from dual;
set termout on
clear columns
prompt ERROR:
host oerr ora &myerrno | grep -v '//' | sed 's/"//g' | awk -F, '{print "ORA-" $1 ": " $3}'
prompt SP2-0751: Unable to connect to Oracle.  Exiting SQL*Plus
exit &myerrno
 
--End added by APRIL FOOL
 
:wq
 
[oracle@localhost admin]$ sqlplus "/ as sysdba"
 
SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 31 17:50:39 2016
 
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
ERROR:
ORA-12539:  TNS:buffer over- or under-flow
 
SP2-0751: Unable to connect to Oracle.	Exiting SQL*Plus
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
[oracle@localhost admin]$ sqlplus "/ as sysdba"
 
SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 31 17:50:52 2016
 
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
ERROR:
ORA-12552:  TNS:operation was interrupted
 
SP2-0751: Unable to connect to Oracle.	Exiting SQL*Plus
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
[oracle@localhost admin]$ sqlplus "/ as sysdba"
 
SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 31 17:51:02 2016
 
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
ERROR:
ORA-12502:  TNS:listener received no CONNECT_DATA from client
 
SP2-0751: Unable to connect to Oracle.	Exiting SQL*Plus
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Disclaimer – use at your own risk to workplace harmony!

Categories: DBA Blogs

Pages