More

MrMoenty · on May 26, 2022

Perhaps this is approximately true if you focus exclusively on mass shootings at schools. But shootings overall are one of the leading causes of deaths for children in the US.

See for instance the CDC's visualizations on causes of death per age group [1]. In 2020, 476 children aged 10-14 died in traffic accidents involving motor vehicles. 218 children aged 10-14 were killed in homicides by firearms.

Those two numbers are very much in the same order of magnitude. This is also the case in prior, non-Covid years.

[1] https://wisqars.cdc.gov/data/lcd/home

throwntoday · on May 26, 2022

Any stats on legal vs illegal guns killing children?

Curious as to whether criminals being unable to purchase firearms legally has done anything to stop them from getting ahold of firearms and committing shootings.

MrMoenty · on Jan 30, 2022

There is, e.g. § 812(1) BGB in German law:

> Wer durch die Leistung eines anderen oder in sonstiger Weise auf dessen Kosten etwas ohne rechtlichen Grund erlangt, ist ihm zur Herausgabe verpflichtet. Diese Verpflichtung besteht auch dann, wenn der rechtliche Grund später wegfällt oder der mit einer Leistung nach dem Inhalt des Rechtsgeschäfts bezweckte Erfolg nicht eintritt.

> A person who obtains something as a result of the performance of another person or otherwise at his expense without legal grounds for doing so is under a duty to make restitution to him. This duty also exists if the legal grounds later lapse or if the result intended to be achieved by those efforts in accordance with the contents of the legal transaction does not occur.

MrMoenty · on Aug 19, 2021

Given that training the same network (with the same structure) will result in different weights and different hashes with high probability, I would argue that the weights actually have all the important properties of a secret key. You would just need to treat them as one operationally, i.e., make sure as few people as possible have access, and use truly random instead of pseudorandom numbers during training.

MrMoenty · on Nov 12, 2020

Ironically, that exit tax and the associated fees open up one of the only ways to obtain dual German citizenship as an American adult: Depending on your income, the German government may consider the cost of dropping the U.S. citizenship unreasonable, and therefore allow you to obtain German citizenship while keeping your American one.

https://www.exberliner.com/features/politics/can-amis-double...

MrMoenty · on Jan 31, 2020

I think part of the problem of this graph is that it attributes commercial building emissions to the municipalities in which they are placed, even though these buildings might be used by residents and commuters alike.

So if you move into the city instead of commuting, you might be using the same commercial buildings a before, but their emissions are suddenly attributed to you.

MrMoenty · on Nov 4, 2019

In deep learning, you generally don't require differentiability on the entire domain, only on most points you're likely to encounter. So a finite number of non-differentiable points is fine: You just trust that you're never going to hit them by chance (the probability that you do is 0), and if by some miracle you do, you just use a subgradient.

Case in point, the currently most used activation function in neural nets, the rectified linear unit

ReLU(x) = max(x, 0),

is clearly not differentiable everywhere either.

wenc · on Nov 4, 2019

> by some miracle you do, you just use a subgradient

This is the most succinct comment I have encountered on how people think about non-differentiability in deep learning.

This helped me reconcile my experiences with the deep learning paradigm. Thank you.

You see, in the numerical optimization of general mathematical models (e.g. where the model is a general nonlinear -- often nonconvex -- system of equations and constraints), you often do hit non-differentiable points by chance. This is why in mathematical modeling one is taught various techniques to promote model convergence. For instance, a formulation like x/y = k is reformulated as x = k * y to avoid division by zeros in y during iteration (even if the final value of y is nonzero) and to avoid any nonsmoothness (max(), min(), abs() functions for instance are replaced with "smooth" approximations). In a general nonlinear/noconvex model, when you encounter non-differentiability, you are liable to lose your descent direction and often end up losing your way (sometimes ending up with an infeasible solution).

However it seems to me that the deep learning problem is an unconstrained optimization problem with chained basis functions (ReLU), so the chances of this happening is slighter and subgradients provide a recovery method so the algorithm can gracefully continue.

This is often not the experience for general nonlinear models, but I guess deep learning problems have a special form that lets you get away with it. This is very interesting.

fspeech · on Nov 4, 2019

I don't know why you think subgradient is that important. It's just a shorthand for anything reasonable. DNNs are overwhelmingly underdetermined and have many many minimizers. It's not so important to find the best one (an impossible task for sgd) as to find one that is good enough.

wenc · on Nov 4, 2019

> I don't know why you think subgradient is that important.

I underquoted. It's more the approach to handling of nondifferentiability in deep learning problems that is of interest to me, whether it involves subgradients or some other recovery approach.

These approaches typically do not work well in general nonlinear systems, but they seem to be ok in deep learning problems. I haven't read any attempts to explain this until I read parent comment.

> It's just a shorthand for anything reasonable. DNNs are overwhelmingly underdetermined and have many many minimizers.

This is not true for general nonlinear systems, hence my interest.

maffydub · on Nov 4, 2019

Agreed. I was just reacting to the parent's comment that

> Non-differentiable here doesn't mean actually non-differentiable in the mathematical sense, it just means that the function does not expose a derivative that is accessible to you.

I read that as meaning that the loss functions being considered were differentiable in the mathematical sense, it was just hard to calculate the derivative.

MrMoenty · on Nov 5, 2019

My point is that the parent's comment is mostly right: a frequent challenge in ML is black box computational units which don't expose the necessary information to run autograd. Even if the underlying mathematical function is not differentiable everywhere, but only on most points, having autograd available is valuable for use in training.

Hence you get works like this [1] which reimplement existing systems in a way that is amenable to autograd. [1] https://arxiv.org/abs/1910.00935

MrMoenty · on Oct 6, 2019

The common solution found in European cities is to have one street reserved entirely for pedestrians, and to allow delivery access from the two adjacent, parallel streets. But of course this requires a certain amount of infrastructure within the buildings.

alkonaut · on Oct 6, 2019

Yes, and obviously it only makes it possible to make half the streets pedestrian streets in an area. To make every street in an area pedestrian-only you’d need a different solution. Deliveries can often be handled in early mornings for example.

MrMoenty · on Sept 20, 2019

The article is not about models being indistinguishable from random classifiers, the difference there should be very significant even on the tasks it discussed. Instead, the problem originates from the small differences in test set performance between the top N models. While that difference may very well increase when moving from binary classification to a more technically involved regression task, that is by no means guaranteed, and the main points of the article still apply.

MrMoenty · on Aug 1, 2019

The question is whether the reduced plastic pollution will make it worth it to slightly increase CO2 output on grocery bags. Given that the absolute cost in CO2 equivalent for a cotton bag is only 3.9kg according to the study you linked, I'd argue that it is negligible compared to the costs of plastic pollution. Ultimately, this seems like a tiny contribution to climate change, but a huge improvement in terms of pollution.

MrMoenty · on March 8, 2019

I believe the statisticians would call this heteroscedasticity. But meta-variety certainly easier to remember ;)

perilunar · on March 9, 2019

That's weird. Never heard the word heteroscedasticity before and now I see it in two different HN threads within the hour. (the other being https://news.ycombinator.com/item?id=19337466).

aggerdom · on March 10, 2019

See further the Baader-Meinhof Effect. [1]

[1] https://en.m.wikipedia.org/wiki/Baader%E2%80%93Meinhof_effec...

perilunar · on March 10, 2019

I thought that too, initially, but consider: https://hn.algolia.com/?query=heteroscedasticity&sort=byDate...

3 mentions in the same day out of a total of 8 mentions ever on HN (not including the 2 meta-comments). User dcomp noticed it also.

(oops, just bumped those numbers up one)

gugagore · on March 8, 2019

Meta-variety is like the variance of variances.

Heteroscedasticity is like changing variance.

Nice! I hadn't thought about that.

abakker · on March 9, 2019

Right, so this is like having autocorrelation, right? Or at least having a confounding variable correlated to your error term?

gugagore · on March 9, 2019

Sorry, I'm not sure what you mean. But I think the answer is no.