Even More on the Capabilities of Current-Gen “AI”

Another Twitter post led me to this Reddit post:

Further down in the comments, OP had this to say:

I copy-paste all of this for full context, but I want to emphasize this paragraph:

It tried a bunch of things before it had the breakthrough, but as you probably know it always said: “I found it! This is finally the root case of the problem!” but every AI does that on almost every prompt, so it wasn’t anything special. It was just another thing it did that I tested and noticed it worked without also regressing other stuff, and then I looked at it and compared it, and then realized what it did. Then I had to go and delete a bunch of other unnecessary changes that Opus also did that it insisted was good to leave in and it didn’t want to remove, but wasn’t actually pertinent to the issue.

Now, I make this sentiment on various social media sites so often that I have it as a shortcut on Mac and iOS: when I type, “reddits”, it autocompletes to say, “Reddit is a CCP-funded deep state psyop against a very specific demographic, and nothing on it should be taken at face value.” But in this case, it rings authentic. I deal with this most every day at this point.

A few years ago, a few Google searches would dig up StackOverflow answers and personal blog posts that specifically dealt with whatever you were asking about. But within the span of a few years, Google has become almost worthless for searching on programming problems. It’s lucky that LLM’s have come along now, or this work would be that much harder. I’d suggest that Google let their search product slide into terribleness in order to push people to their AI product, but they don’t have one yet, so their awfulness can just be ascribed to basic late-stage capitalism and utterly-predictable monopolistic malfeasance.

Anyway, this last quote is so appropriate. AI can’t figure out if what it did actually worked, but it always says it does, and when you see a move in the right direction, you have to figure out what part made it work, and then delete all the other stuff that the AI broke in the process. In this regard, it is exactly like a guy I used to work with who would break half a dozen things on my servers before he got something working. He never cleaned up his failed attempts. Every time he told me he had done something, I’d ask a few questions about his process, and then just quietly go put things back in order.

I just went through this process over the last weekend. I’m trying to move a codebase from Rails 6 to Rails 8. There’s a lot of changes in the way Rails handles Javascript between these two versions, from the bundling to the running. I’ve gotten left behind on a lot of this. Even when I spun up this Rails 6 app six years ago, I was using the old asset bundling technique from Rails 3/4. I was happy to make the jump to “no-build” in 8 all at once, but my application servers needed upgrading from a version of Ubuntu which is no longer getting security updates. This upgrade forced me into upgrading NodeJS. This upgrade broke the asset building process in Rails because the dreaded SSL upgrade has moved to this part of the stack now. So I moved to Webpacker, which took way too long to work out. I tried to use AI throughout, but it was of almost no help at all.

After finally getting moved to Webpacker, just in time to move to ImportMap, I have had to tackle how Stimulus, Turbo, and Hotwire work. Rails 7 focused on Turbolinks, which utterly broke the Javascript datatable widget, AgGrid, which I use all over the site, so I removed Turbolinks from my Rails 6 app, and never upgraded to 7. Now I’m learning how to do things in the new “Rails way,” and AI has been helpful… in precisely the same way that this Reddit poster describes. I’ve had to run many prompts and tweak and cajole it’s “thinking,” but I finally got a really neat and tiny solution to a question… All while verifying the approach with videos from GoRails. (Which I subscribed to just to make this turn of learning.)

After I had a working function to a particular feature I wanted, I had an “aha!” moment. I could see what all this new tooling and plumbing was about. I felt a little foolish, because it winds up just being a way to move the Javascript you need out of the view code. That’s a Good Thing (TM), but I couldn’t see the forest for the trees until that moment.

And even after this success, I’m plagued with a more philosophical question. The way that Claude factored the feature I wanted was all Javascript. Meaning, once the view was loaded, it dealt with the interactivity without going back through a server-side render step. It relied on the browser doing the DOM manipulation. Which is the “old” way of doing things, right? I asked it to specifically use Turbo streams to render the HTML that the browser could use to simply replace the div, and it said, “Oh, yes, right, that’s the more idiomatic way to do this in Rails,” and gave me a partial that the Stimulus controller use to do the same thing. But now I have a clean, one-file, entirely-Stimulus approach, versus having extra calls to format_for in a controller, a turbo-stream ERB file, and a partial. Seems to me like too much extra to make things idiomatic.

Also, when I asked Claude for this refactor, it broke the feature. So now I have to figure out if I want to fix the turbo-stream approach to keep things “the Rails way,” or just let Stimulus handle the whole thing. I think I will be using turbo-streams to refresh data in my AgGrid tables, but I think I’ll let Stimulus do all the work when it can. It keeps things simpler, and it’s basically what I was doing before anyway.

I want to go back to what I was saying before about how you have to “clean up” after the AI. This is critically important, and it’s a problem in the making that I’m hoping corporate managers figure out before it becomes permanent. If you hire juniors and expect them to produce code like seniors with AI, you’re going to wind up with a bunch of instability because of extraneous code that AI leaves behind. I expect that this problem is too “viral” to survive. I don’t think an actual, useful, non-trivial application could last very long being vibe-coded. It would start to teeter, and people would hit it with more AI, and it would fall over and shatter, and then they’d get to keep all the pieces. I worry that current applications will be patched here and there by juniors who don’t clean up the mess left behind, and these errors will accumulate until the codebase is so broken that…

Oh, for Pete’s sake. What am I even saying!? The same thing will happen here that has been happening for 40 years of corporate IT development: systems get so wonky and unmaintainable and misaligned that new middle managers come in, pitch senior management into doing a massive system replacement, spend twice as much time and three times as much money as they said it would take, launch the system with dismal performance and terrible UI, piss everyone off, polish their resume, get a new job, and leave the company and everyone in it holding the bag with the accretion their terrible decisions made by committee over years.

AI will change absolutely nothing about this. The problem isn’t technology, or code, or languages, or databases, or API’s, or anything else. The problem is people. It’s always BEEN people. I’m not clear that it ever NOT be about people.

More on the Capabilities of Current-Gen “AI”

Eric Raymond, another bright star in the programming universe, weighed in on the actual capability of current-gen “AI.” He echoed DHH and Carmack, again reiterating my own opinion that LLM’s cannot replace humans at (non-trivial) programming. Yet. Sure, it can make a single function or a web page, but even then you’ll have to fix things so that it doesn’t accumulate error into the project.

Maybe better “meta-LLM’s,” with more specialist subsystems, will be able to do better, but we really already have them. It’s not a difference in degree, but of kind. We will need to come up with some other technology before AI supplants humans at programming. Maybe the next step is AGI, maybe there’s a couple more intermediate developments before that becomes a reality.

At this point, it should be becoming clear that people who are obsequiously bullish on how AI is going to replace all your programmers at your company are grifting. As the line in the Princess Bride says, “Anyone who says differently is selling something.”

Typical IT

From here.

It’s not “insane.” And, in fact, 100 days seems like a short request.

In my Fortune 250, I recently spent 4 months asking for a change that took literally 30 seconds to do. I wasn’t confused. They eventually did exactly what I asked to be done. But the email chain eventually included 52 people, and required a meeting to finally get the people responsible for doing the thing to actually DO the thing.

This is a perfectly-predictable result of the 40-years-out-of-date “best practice” of how to do “IT,” driven by the accumulation of selfish motivations present in any large collection of people.

Internal “product managers” with zero technical ability make requests to outsourced “product managers” with zero technical ability, who relay the request to internal teams which take a month to figure out who has the very specific knowledge of how to, say, move login buttons around on web pages — and ONLY that specific knowledge — and then middle managers with no technical ability have to schedule that activity with the activities of all the other people who have one, very-specific thing they know how to do. And after it’s finally changed in dev, then it has to be pushed through QA, staging, and finally prod.

This is how Fortune 500 companies do “IT,” all day, every day, and the very same incentives that have led to this climate are present in government, just writ larger.

Capabilities of Current-Gen “AI”

There are 2 schools of people on Twitter on using AI in programming. One states emphatically that they are producing fully-realized projects through nothing but “vibe coding,” and the other states, well, what DHH says here.

John Carmack had this summary, and he should know.

This put into words my feeling that LLM’s are just another tool — an advanced tool, to be sure — but “just” another tool, like source code managers, diff-er’s, IDE’s, debuggers, and linters. In fact, writing code is the least interesting or important part of creating software to do something non-trivial and useful. It’s the understanding and translating that need into an application that’s the magical part, and it’s my contention that LLM’s will never be able to fill that role. If you can also make the program work well and be fast and look nice, that’s the fun part. Maybe a future version of AI built on a different technology will be able to do these things, but not this version.

CoPilot Having a Normal One

Sigh.

I mean, even if you can’t recall the ASCII characters for a hex value (like me), you should be able to realize that that 0x51 is one less than 0x52, so that the “R” and the “3” should be right next to each other. Whether the “R” should be a “4”, or the “3” should be a “Q”, you can see that this is just plain wrong at first glance. LLM’s can’t. I get it, of course. CoPilot interpreted the 0x51 in the second position as decimal instead of hex (as opposed to all the others), which does accurately translate to a “3”.

That’s the thing I find about CoPilot and ChatGPT so far: They have quick answers and suggestions for every line as I’m typing, and half of everything that looks right at first glance turns out to be wrong. I actually started to argue with CoPilot after fruitlessly trying to use it to track down a bug for a half hour. What I am doing with my life?

But sure, tell me how we’re all going to lose our jobs this year because of this technology.

Microsoft Strikes Again

CoPilot started to answer this question in the Visual Studio Code “chat window” on my work laptop. It was spitting out an answer which I was digesting — and finally being enlightened about Ruby/Rails namespaces, the autoloader, the :: operator, and directory structure — and then it abruptly deleted its response, and printed this.

When you’re focused on a programming idea, you sometimes get blind to the other things in your code for the moment, but I finally figured out that I had a corporate URL in my code, which CoPilot was parroting back at me for context, despite being irrelevant to the question, and this was why it freaked out. So, ok, my company configured CoPilot requests on its computers to freak out about that.

Searching on this canned response shows a lot of people encounter this, and are similarly bewildered, and I’m suspecting that there are probably many other reasons for this to happen. Quite naturally, people are confused because there’s no indication as to why the “answer” provoked this response. I asked the exact same question on my personal computer and it worked just fine, so this is definitely a corporate filter that’s running… somewhere.

This is why Microsoft rules the corporate world: they give middle managers the power to do things like this. Anything they can dream up as a policy, Microsoft is only too happy to give them the tools to enforce it. However, it seems to me that any company that has the wherewithal to do this would also have the wherewithal to tell Microsoft not to use its code for their AI purposes. If CoPilot can be trained to barf on internal URL’s, it can be trained to not store or train on the response when it hits the configured input conditions, and not interrupt the programming loop with a useless and confusing error message.

This is precisely this kind of BS that I feared when Microsoft bought GitHub, even if I couldn’t put it into words at the time. But who had 2024 as the year of AI coding on their bingo cards when this happened 6 years ago? So no one could have put this into words back then.

Insights into Stack Overflow’s traffic – Stack Overflow

Source: Insights into Stack Overflow’s traffic – Stack Overflow

Over the last few weeks, we’ve seen inaccurate data and graphs circulating on social media channels regarding Stack Overflow’s traffic. We wanted to take the opportunity to provide additional context and information on the origin of that data, the traffic trends we are seeing, and the work we’re doing to ensure Stack Overflow remains a go-to destination for developers and technologists for years to come.

They are responding to this graph, which I saw this on some aggregate social media site.

First, ChatGPT couldn’t have started making a difference at this time. It, along with other LLM’s, hasn’t really become useful till this year.

Second, it couldn’t have made that much of a difference that fast. Nothing does.

Third, who would take this graph out of context and overlay this trend line and blame it on ChatGPT? What’s the thinking? Who benefits? Was it for the lulz? Was it to drive mindshare about what “AI” is supposedly doing “for us?” “To” programming? Why has it been pushed in front of so many people that StackOverflow feels the need to set the record straight?

Get Me Out Of Data Hell — Ludicity

The Pain Zone… is an enterprise data warehouse platform. At the small scale we operate at, with little loss of detail, a data warehouse platform simply means that we copy a bunch of text files from different systems into a single place every morning.

The word enterprise means that we do this in a way that makes people say “Dear God, why would anyone ever design it that way?”, “But that doesn’t even help with security” and “Everyone involved should be fired for the sake of all that is holy and pure.”

For example, the architecture diagram which describes how we copy text files to our storage location has one hundred and four separate operations on it. When I went to count this, I was expecting to write forty and that was meant to illustrate my point. Instead, I ended up counting them up three times because there was no way it could be over a hundred. This whole thing should have ten operations in it.

Almost every large business in Melbourne is rushing to purchase our tooling, tools like Snowflake and Databricks, because the industry is pretending that any of this is more important than hiring competent people and treating them well. I could build something superior to this with an ancient laptop, an internet connection, and spreadsheets. It would take me a month tops.

I’ve known for a long time that I can’t change things here. But in this moment, I realize that the organization values things that I don’t value, and it’s as simple as that. I could pretend to be neutral and say that my values aren’t better, but you know what, my values are better.

PS:

… I gave a webinar to US board members at the invitation of the Financial Times. Suffice it to say that while people are sincerely trying their best, our leaders are not even remotely equipped to handle the volume of people just outright lying to them about IT.

Source: Get Me Out Of Data Hell — Ludicity

(Emphasis mine.)

That last part is really the kicker. Every middle manager in all the various IT organizational structures inside of a Fortune-sized public company are lying about things, whether by omission or by fact. They’re lying about what it is they do. They’re lying about their problems. They’re lying about their capabilities. They’re lying about their timelines.

They’re lying to people who are either don’t care, or aren’t equipped to understand how the things they’re being told are lies, even if they do care. They’re lying to build “kingdoms” in the company by justifying more people, more machines, and more software than is required to solve a problem. And not just by a little; by orders of magnitude.

Recently, it took me seven weeks of emails eventually involving fifty-odd people to get something done that took literally 30 seconds to do. Part of it was because I didn’t understand what I was asking for. I was asking for the wrong thing. Part of that is because the system is stupid, and no right-thinking person would have implemented it that way. Someone, somewhere, a long time ago (who probably left the company now) decided that this is how it should work, because someone at a consultancy told them that this is what “everyone” does.

I was asking for the logical, straightforward thing that would have fixed my issue, now and in the future. After it became clear that this would never happen, the 50+ “subject matter experts” involved had dozens of chances to respond and explain how what I was asking for actually worked, and clarified that I was asking for the wrong thing. But that didn’t happen.

Why? Because explaining why it works the way it does in front of God and everyone would reveal how idiotic it is. This can’t even be admitted over voice, but after several Zoom calls, you eventually see the pattern. It’s like the old magic eye pics in the 90’s. Eventually, you get your focus depth correct, and see the real picture. The image that no one else sees. They’re not paid to, so they don’t care.

Not only is the process stupid, the “self-help” web site that’s supposed to allow people to address this problem themselves is opaque, and doesn’t explain what’s going on. It masks the issue that I was having, when it would be very easy to show. This is a recurring pattern. Various IT functions have implemented “self-help” web sites that simply do not work, for reasons they are completely blind to because they never use them. They could make two small changes to this page, and it really would (mostly) address this stupid, broken policy. After all this wasted effort, someone involved seemed to finally understand my confusion and understand how this could be fixed, but I’ll bet they never do it.

Unless senior management — and I mean the guys right under the officers, because the officers are never going to care, and the upper-middle guys don’t have the political clout to do it — unless they are curious, concerned, and knowledgeable enough to ask illuminating questions to pierce the veil, the lies will go unchallenged, and the technical debt will continue to grow with every new project, and every project that is introduced to fix one that just failed.

At some point, when your personal sensibilities and the demonstrated collective priorities of the organization repeatedly come into conflict, you have to make a decision if you’re in the right place. For instance, I currently have personal issues which make the “switching costs” prohibitive, but this is an extremely individualistic equation to balance.