shnizmuffin

CTRL+Z

  • 55 Posts
  • 871 Comments
Joined 2 years ago
cake
Cake day: July 6th, 2023

help-circle
  • Methodology

    To identify the states that spend the most and least on fast food, WalletHub analyzed the price of hamburgers, pizza and chicken sandwiches across the 50 states.

    We summed up the individual costs of the components and adjusted the resulting figure to the median monthly income in each state, then used these results to rank-order our sample.

    For simplicity, we considered the acquisition of only one unit of each component included in our calculation.

    They left out chicken nuggets/tenders, French fries, and beverages. They left out all desserts. Breakfast. Coffee. Sandwiches not made of chicken. They didn’t normalize for population.












  • If I were to ask my Magic 8 Ball “Is the word ‘difinitely’ misspelled?” 100 times, it’s going to reply in the affirmative over 16% of the time. Literally double. This would also be “the very first experiment in this use case, done by a single person on a model that wasn’t specifically designed for this.”

    It’s not impressive.

    The issue with hallucinations…

    This is the real problem: working under the false assumption that there are two kinds of output. It’s all the same output. An LLM cannot hallucinate in the same way that it cannot think or reason. It’s fancy autofill. Predictive text.

    You can use it to brainstorm creative solutions, but you need to treat its output for what it is: complicated dice rolls from the tables in the back of the Dungeon Masters Guide. A fun distraction. Implausible fantasy 9 times out of 10.



  • In 100 runs only 8 correctly identify the targeted vulnerability, the rest are false positives or claim that there are no vulnerabilities in the given code. … [The] signal to noise ratio is very low, and one has to sift through a lot of wrong reports to get a realistic one.

    It was right 8% of the time when presented the least amount of input to find a known bug. Then, when they opened it up to more of the codebase, its performance decreased.

    I’m not going to use something that’s wrong over 92% of the time. That’s insane. That’s like saying my Magic 8 Ball “could be used as a useful tool for helping to detect vulnerabilities.” The fucking rubber ducky on my desk has a more reliable clearance rate.